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Preface 



This volume contains the proceedings of the conference on Computer-Aided 
Verification (CAV 2001), held in Paris, Palais de la Mutualite, July 18-22, 2001. 
CAV 2001 was the 13th in a series of conferences dedicated to the advance- 
ment of the theory and practice of computer- assisted formal analysis methods 
for software and hardware systems. The CAV conference covers the spectrum 
from theoretical results to concrete applications, with an emphasis on practical 
verification tools and algorithms and techniques needed for their implementa- 
tion. 
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The program of CAV 2001 consisted of: 

• 2 tutorials, respectively by David Basin on “Monadic Logics on Strings and 
Trees” and by Pascal Van Hentenryck on “Constraint Solving Techniques” ; 

• 2 invited conference presentations by David Parnas on “Software Documen- 
tation and the Verification Process” and Xavier Leroy on “Java Bytecode 
Verification: An Overview”; 

• 33 regular paper presentations, which consitute the core of this volume. The 
accepted papers were selected from 106 regular paper submissions. Each 
submission received an average of 4 referee reviews. 
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• 13 tool presentations, whose descriptions can also be found in this volume. 
The tool presentations were selected among 27 submissions, and were re- 
viewed in the same way as the regular papers. For each tool presentation, 
there was also a demo at the conference. The increasing number of tool 
submissions and presentations shows both the liveliness of the field and its 
applied flavor. 

In addition, there were five satellite workshops on Inspection in Software En- 
gineering, Logical Aspects of Cryptographic Protocol Verification, Runtime Ver- 
ification, Software Model Checking, and Supervisory Control of Discrete Event 
Systems. The publication of these workshops proceedings was managed by their 
respective chair, independently of the present proceedings. 

The CAV conference was colocated with the related Static Analysis Sympo- 
sium, to enable participants to attend both. 

We would like to thank here the numerous external reviewers who helped to 
set up a high quality program and whose names appear in the list below. 
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Software Documentation 
and the Verification Process 



David Lorge Parnas 



Department of Computing and Software 
Faculty of Engineering, McMaster University 
Hamilton, Ontario, Canada L8S 4L7 
parnasOqusunt . cas . mcmaster . ca 



Abstract. In the verification community it is assumed that one has 
a specification of the program to be proven correct. In practice this is 
never true. Moreover, specifications for realistic software products are 
often unreadable when formalised. This talk will present and discuss 
more practical formal notation for software documentation and the role 
of such documentation in the verification process. 
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Kedar S. Namjoshi 
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Abstract. Model Checking is an algorithmic technique to determine 
whether a temporal property holds of a program. For linear time prop- 
erties, a model checker produces a counterexample computation if the 
check fails. This computation acts as a “certificate” of failure, as it can 
be checked easily and independently of the model checker by simulating 
it on the program. On the other hand, no such certificate is produced if 
the check succeeds. In this paper, we show how this asymmetry can be 
eliminated with a certifying model checker. The key idea is that, with 
some extra bookkeeping, a model checker can produce a deductive proof 
on either success or failure. This proof acts as a certificate of the result, as 
it can be checked mechanically by simple, non-fixpoint methods that are 
independent of the model checker. We develop a deductive proof system 
for verifying branching time properties expressed in the mu-calculus, and 
show how to generate a proof in this system from a model checking run. 
Proofs for linear time properties form a special case. A model checker 
that generates proofs can be used for many interesting applications, such 
as better ways of exploring errors in a program, and a tight integration 
of model checking with automated theorem proving. 



1 Introduction 

Model Checking ICE81|US82| is an algorithmic technique to determine whether 
a temporal property holds of a program. Perhaps the most useful property of the 
model checking algorithm is that it can generate a counterexample computation 
if a linear time property fails to hold of the program. This computation acts as 
a “certificate” of failure, as it can be checked easily and efficiently by a method 
independent of model checking - i.e., by simulating the program to determine 
whether it can generate the computation. On the other hand, if it is determined 
that a property holds, model checkers produce only the answer “yes”! This does 
not inspire the same confidence as a counterexample; one is forced to assume 
that the model checker implementation is correct. It is desirable, therefore, to 
provide a mechanism that generates certificates for either outcome of the model 
checking process. These certificates should be easily checkable by methods that 
are independent of model checking. 

In this paper, we show how such a mechanism, which we call a certifying 
model checker, can be constructed. The key idea is that, with some extra book- 
keeping, a model checker can produce a deductive proof on either success or 
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failure. The proof acts as a certificate of the result, since it can be checked inde- 
pendently using simple, non-fixpoint methods. A certifying model checker thus 
provides a bridge from the “model-theoretic” to the “proof-theoretic” approach 
to verification EEinn]. 

We develop a deductive proof system for verifying mu-calculus properties of 
programs, and show it to be sound and relatively complete. We then show how to 
construct a deductive proof from a model checking run. This is done by by storing 
and analyzing sets of states that are generated by the fixpoint computations 
performed during model checking. The proof system and the proof generation 
process draw upon results in EMI and [K.IS93) . which relate model checking 
for the mu-calculus to winning parity games. A prototype implementation of a 
proof generator and proof checker for linear time properties has been developed 
for the COSPAN symbolic model checker. 

The ability to generate proofs which justify the outcome of model checking 
makes possible several interesting applications. For instance, 

— A certifying model checker produces a proof of property / on success, and 
a proof of on failure. The proof of ~^f is a compact representation of all 
possible counterexample computations. As is shown later, it can be exponen- 
tially more succinct than a single computation. Particular counterexample 
computations can be “unfolded” out of the proof by an interactive process 
which provides a better understanding of the flaws in the program than is 
possible with a single computation. 

— Producing a deductive proof makes it possible to tightly integrate a certi- 
fying model checker into an automated theorem prover. For instance, the 
theorem prover can handle meta-reasoning necessary for applying compo- 
sitional or abstraction methods, while checking subgoals with a certifying 
model checker. The proofs produced by the model checker can be composed 
with the other proofs to form a single, checkable, proof script. 

The paper is organized as follows. Section Elcontains background information 
on model checking and parity games. Sectional develops the deductive proof sys- 
tem for verifying mu-calculus properties, and Section 0 shows how such proofs 
can be generated by slightly modifying a mu-calculus model checker. Applica- 
tions for certifying model checkers are discussed in detail in Section 0 Sectional 
concludes the paper with a discussion of related work. 

2 Preliminaries 

In this section, we define the mu-calculus and alternating tree automata, and 
show how mu-calculus model checking can be reduced to determining winning 
strategies in parity games. 



2.1 The Mu- Calculus 

The mu-calculus !Koz82) is a branching time temporal logic that subsumes 
|FL8fi] commonly used logics such as LTL, w-automata, CTL, and CTL*. The 
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logic is parameterized with respect to two sets: S (state labels) and F (action 
labels). There is also a set of variable symbols, V. Formulas of the logic are 
defined using the following grammar, where I is in S, a is in T, Z is in V, and 
/r is the least fixpoint operator. 

^ ::=l \ Z \ {a)F | | ^ A ^ | {^J,Z : 

To simplify notation, we assume that S and F are fixed in the rest of the 
paper. A formula must have each variable under the scope of an even number of 
negation symbols. A formula is closed iff every variable in it is under the scope 
of a /r operator. Formulas are evaluated over labeled transition systems (LTS’s) 
|Kel76j . An LTS is a tuple {S, sq, R, L), where S' is a non-empty set of states, 
So G S is the initial state, R C SxFxS is the transition relation, and L : S ^ E 
is a labeling funetion on states. We assume that R is total; i.e., for any s and a, 
there exists t such that {s,a,t) G R. The evaluation of a formula /, represented 
as II /lie is a subset of S, and is defined relative to a context c mapping variables 
to subsets of S. The evaluation rule is given below. 



- ||/||, = {s|sGS A L(s)=/}, \\Z\\, = c{Z), 

- Il(a)^llc = {s\{3t : R{s,a,t) A t G ||^>||c)}, 

- |h<l>|U = S\||<Z>||e |1<I>1 A<l> 2 ||c=||^>l||cn||<l> 2 ||e 

- WifiZ : <?>)|U = C\{T -.T C s A m\c[z^T] C T}, where c[Z ^ T] is the 
context c' where, for any X, c' {X) is T if A = Z, and c(A) otherwise. 



A state s in the LTS satisfies a closed mu-calculus formula / iff s G ||/||_l, where 
T maps every variable to the empty set. The LTS satisfies / iff sq satisfies /. 
Mu-calculus formulas can be converted to positive normal form by introducing 
the operators <Pi V < 1>2 = ^(^(^i) A ^(^ 2 )), [a]^ = and {lyZ : (p) = 

~^{fJ,Z : ^<!>{^Z)), and using de Morgan rules to push negations inwards. The 
result is a formula where negations are applied only to elements of E. 
Mu-Calculus Signatures: Consider a closed mu-calculus formula / in positive 
normal form, where the /r- variables are numbered Yi , . . . , 17 i in such a way that 
if {fjYi) occurs in the scope of (/iFj) then j < i. Streett and Emerson 
show that, with every state s of an LTS M that satisfies /, one can associate a 
lexicographically minimum n-vector of ordinals called its signature, denoted by 
sig{s, /). Informally, sig{s, f) records the minimum number of unfoldings of least 
fixpoint operators that are necessary to show that s satisfies /. For example, for 
the CTL property EF(p) = (/rYi : p V (r)yi), sig{s, EF(p)) is the length of the 
shortest r-path from s to a state satisfying p. 

Formally, for an n-vector of ordinals v, let f" be a formula with the se- 
mantics defined below. Then sig{s,f) is the smallest n-vector v such that s G 
||/^||j_. First, define the new operator for an ordinal k, with the semantics 
\\{p^Y : <?)||c = Y^, where Y^ = 0, = ||^||, 3 [y^^yi], and for a limit ordinal 

A, = (Ufc : fc < A : Y’^). 



nc=\\l\\c, ll(-01c=|h^||c, ||Zlc=||y||c, 

((a)<Z>)l,= ||(a)(<Z>’')||„ ||([a]<l>)l,= ||H(<l>’')||„ 

(^1 A <Z>2)lc = A ^>^||e, \\{^l V ^2f\\c = \m V 
{pY, : <Pr\\, = m, where c' = c[Y, ^ \\{p'^^^^Y, : <P)\U] 
{vZ : <py\\c = ||^’'||c', where c' = c[Z ^ \\{yZ : ^)||c]. 
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2.2 Alternating Automata and Parity Games 

An alternating automaton is another way of specifying branching time temporal 
properties. For sets E and F of state and transition labels respectively, an alter- 
nating automaton is specified by a tuple (Q,qo,5,F), where Q is a non-empty 
set of states, qo € Q is the initial state, and (5 is a transition function mapping 
a pair from Q x E to a, positive boolean expression formed using the opera- 
tors A, V applied to elements of the form true, false, q, (a) q and [ajg, where 
a € E, and q G Q. F is a parity acceptance condition, which is a non-empty list 
(Fo) • ■ ■ ) Fn) of subsets of Q. An infinite sequence over Q satisfies F iff the 
smallest index i for which a state in Fi occurs infinitely often on the sequence 
is even. For simplicity, we assume that the transition relation of the automaton 
is in a normal form, where F’ is a partition of Q, and S{q,l) has one of the 
following forms: qi A V (? 2 , {ofqi, [a]gi, true, false. Converting an arbitrary 

automaton to an equivalent automaton in normal form can be done with a linear 
blowup in the size. 

A tree is a prefix-closed subset of N*, where A, the empty sequence, is called 
the root of the tree. A labeled tree t is a tree together with two functions Nt : 
t ^ E and Ft : edge{t) F, where edge{t) = {{x,x.i)\x £ t A x.i G t}. We 
require the transition relation of such a tree to be total. 

The acceptance of a labeled tree t by the automaton is defined in terms of 
a two-player infinite game. A configuration of the game is a pair {x,q), where 
a: is a node of the tree and q is an automaton state. If 5{q,Nt{x)) is true, 
player I wins, while player II wins if it is false. For the other cases, player I 
chooses one of qi , q 2 if it is qi V (72 , and chooses an a-successor to x if it is 
{a)qi. Player II makes similar choices at the A and [a] operators. The result 
is a new configuration {x',q'). A play of the game is a maximal sequence of 
configurations generated in this manner. A play is winning for player I iff either 
it is finite and ends in a configuration that is a win for I, or it is infinite and 
satisfies the automaton acceptance condition. The play is winning for player II 
otherwise. A strategy for player I (II) is a partial function that maps every finite 
sequence of configurations and intermediate choices to a choice at each player I 
(II) position. A winning strategy for player I is a strategy function where every 
play following that strategy is winning for I, regardless of the strategy for II. 
The automaton accepts the tree t iff player I has a winning strategy for the game 
starting at (A, go)- An LTS M satisfies the automaton iff the automaton accepts 
the computation tree of M. 

Theorem 0. [KlhHI Wf)5] For any closed mu-calculus formula /, there is a 
linear-size alternating automaton Af such for any LTS M, M satisfies / iff M 
satisfies Af. The automaton is derived from the parse graph of the formula. 

A strategy s is history-free iff the outcome of the function depends only on 
the last element of the argument sequence. By results in lEnnn, parity games 
are determined (one of the players has a winning strategy) , and the winner has 
a history-free winning strategy. From these facts, winning in the parity game 
generated by an LTS M = (S, sq, R, L) and an automaton A = (Q,qo,S,F) 
can be cast as model checking on a product LTS, M x A, of configurations 
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|E,IS93| . The LTS M x A = (S', Sq, R', L') is defined over state labeling T"' = 
{I,II,wini,winii} x {/o, ••■,/«} and edge labeling F' = {r}, and has S' = 
S X Q and Sq = (sq, qo)- The first component of L'(s, q) is / if 6{q, L(s)) has the 
form qi V q 2 or (a), II if it has the form q\ A <72 or [a] 91, wini if it has the form 
true, and winu if it has the form false. The second component is /i iff 9 € Fi. 
R' is defined as follows. For a state (s, q), if 6{q, L(s)) is true or false, then (s, q) 
has no successors; if S(q, L(s)) is qi V 52 or qi A 52, then (s, q) has two successors 
(s,qi) and (5,(72); if S(a,L(s)) is (a)qi or [a]gi, then (s,q) has a successor (t,qi) 
for every t such that R(s,a,t) holds, and no other successors. 

Let Wj = (cToZo . . . cr„Z„ : . . . , Zn)), where ai is if i is even and /r 

otherwise, and I>i{Zo, ■ ■ ■ , Zn) = winj V (/ A ( A i : /i (T)Zi)) V (II A (Ai : 
fi b']^i))- The formula W/ describes the set of configurations from which 
player I has a winning strategy. Similarly, player II has a winning strategy from 
the the complementary set W//, where Wu = (SqZq . . . 5nZn ■ <l>ii(Zo, ■ ■ ■ , Zn)), 
where Si is /r if z is even, and v otherwise, and <Pii(Zq, . . . , Zn) = winu V (/ A 
(Ai: ft ^ [T]Zt)) V (II A (Ai : fi ^ {T)Zi)). 

Theorem 1. (cf. EM) For an LTS M and a normal form automaton A of 
the form above, M satisfies ^ iff M x T, (sq, qo) H TV/. 



3 The Proof System 



Deductive proof systems for verifying sequential programs rely on the two key 
concepts of invariance (e.g., loop invariants) and progress (e.g., rank functions, 
variant functions) [IF1ofi7IHoa,fi9] . These concepts reappear in deductive verifica- 
tion systems for linear temporal logic |M F83IM P87K 'M88| . and also form the 



basis for the proof system that is presented below. 

Suppose that M = (S, sq, R, L) is an LTS, and A = (Q, qo, S, F) is a normal 
form automaton, where F — (Fo, Fi, . . . , F 2 n)- To show that M satisfies A, one 
exhibits (i) for each automaton state q, a predicate (the invariant) (pq over S, ex- 
pressed in some assertion language, (ii) non-empty, well founded sets W \, . . . , Wn 
with associated partial orders . . . , and (iii) for each automaton state q, 

a partial rank function pq : S (W,<), where W = Wi x ... x Wn and A is 

the lexicographic order defined on W using the {Aj} orders. 

We extend the A order to apply to elements a,bmWi x . . . x Wk, for some 
A: < n by a A 6 iff (ai, . . . , a^, 0, 0, . . . , 0) A ( 61 , . . . , 6 ^, 0, 0, . . . , 0), where we 
assume, without loss of generality, that 0 is an element common to all the Wfs. 
For an automaton state q, define the relation <1^ over W x W as follows. For 
any a, b, a dq b holds iff for the (unique, since F’ is a partition) index k such 
that q e Fk, either fc = 0, or A: > 0, A: = 2z and (a\, . . . ,Ui) A (bi,...,bi), 
or k = 2i — 1 and (oi , . . . ,ai) A (61 , ... , bi). We use the label I to denote the 
predicate l(s) = (L(s) = 1), and the notation [/] to mean that the formula / is 
valid. Note that, in the following, (a) and [a] are operators interpreted on M. The 
invariants and rank function must satisfy the following three local conditions. In 
these conditions, the variable k has type W. 



Consistency: For each q G 
every state in pq) 



Q, [pq (3A: : (pq = k))] (pq is defined for 
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— Initiality: (f>qg{so) (the initial state satisfies its invariant) 

— Invariance and Progress: For each q G Q, and I G S, depending on the 
form of 5 {q,l), check the following. 

• true: there is nothing to check. 

• false: [(pg ^ ^l] holds, 

• gi A q2'. [pq A I A {pq = k) ^ {pq^ A {pq^ <q k)) A {(pq^ A {pq^ dq k))] 

• qi y 92: [pq A I A {pq = k) ^ {pq^ A {pq^ <Jq k)) V {pq^ A {pq^ dq k))] 

• {a)qi- [pq A I A {pq = k) ^ {a){(pqi A (Pq^ dq /c))] 

• [a]qi- [pq A I A {pq = k) ^ [a]{pq^ A {pq^ dq k))] 

Theorem 2. (Soundness) The proof system is sound. 

Proof. Given a proof in the format above, we have to show that M satisfies A. 
We do so by exhibiting a winning strategy for player I in the parity game. For 
a configuration (s,g), let Pq{s) be its associated rank. Inductively, assume that 
at any configuration (s, q) on a play, pq{s) is true. This holds at the start of the 
game by the Initiality requirement. Suppose that L(s) = 1 . Based on the form 
of (5(9, Z), we have the following cases: 

— true: the play terminates with a win for player I, 

— false: this case cannot arise, as the inductive invariant contradicts the proof 
assertion [pq ~^l]. 

— 9i A 92, [a]9i: Player II plays at this point, with the new configuration sat- 
isfying the inductive hypothesis by the proof. 

— 9i V 92) Player I chooses the qi for which the V proof assertion holds. The 
new configuration (s,9i) thus satisfies the inductive hypothesis. 

— {a)qi: Player I chooses the a-successor t of s which is a witness for the (a) 
formula. Hence, pq^{t) holds. 

Thus, a finite play terminates with S{q, 1 ) = true, which is a win for player I. 
In an infinite play, by the definition of dq, whenever the play goes through a 
configuration (s, q) with 9 in a odd-indexed set ^21-1, the rank decreases strictly 
in the positions l..i, and the only way it can increase in these components is if 
the play later goes through a configuration (s', 9') with q' in an even indexed 
set of smaller index. So, if an odd indexed set occurs infinitely often, some even 
indexed set with smaller index must also occur infinitely often, which implies 
that the smallest index that occurs infinitely often must be even. Thus, the 
defined strategy is winning for player I, so M satisfies A. □ 

Theorem 3. (Completeness) The proof system is relatively complete. 

Proof. We show completeness relative to the expressibility of the winning sets, 
as is done for Hoare-style proof systems for sequential programs jCoo78j . Assume 
that M satisfies A. By Theorem^ M x A, (so,9o) H The history-free 
winning strategy for player I corresponds to a sub-structure N of M x A, which 
has a single outgoing edge at each player I state. 

For each automaton state 9, let pq{s) = (M x A,{s,q) ^ W/). The rank 
function is constructed from the mu-calculus signatures of states satisfying the 
formula W/. For each automaton state 9, let the function pq have domain pq. For 
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every state (s,q) satisfying W/, let Pq{s) be the n-vector that is the signature of 
W/ at {s,q). 

We now show that all the conditions of the proof rule are satisfied for these 
choices. Consistency holds by the definition of the pq functions. Initiality holds by 
the definition of (j)q, since ( sq ) Qo) satisfies W/. From the definition of signatures 
and the shape of the formula defining W/, it it not difficult to show that at each 
transition from a state in N, the signature for W/ decreases strictly in the first i 
components if the state is in F 2 i~i, and is non-increasing in the first i components 
if the state is in F 2 i- This corresponds directly to the progress conditions in the 
proof rule. For each state {s,q) in N, 4>q{s) is true, so the invariance conditions 
also hold. If 6{q,l) is false, then for any state s with L{s) = I, {s,q) represents 
a win for player II, so that s ^ W/, and (j)q A I is unsatisfiable, as desired. □ 
Proofs for Linear Time Properties: Manna and Pnueli ILVIF87I show that 
every w-regular linear time property can be represented by a V-automaton, which 
accepts an w-string iff all runs of the automaton on the string satisfy a co-Biichi 
acceptance condition. Model checking the linear time property h is equivalent to 
checking the branching time property A(h). By the V nature of acceptance, the A 
quantifier can, informally, be distributed through h, resulting in a tree automaton 
where S is defined using only [a] and A operators. Specializing our proof system 
to such automata results in a proof system similar to that in |MP87| . 

Proofs for LTS’s with Fairness: So far, we have only considered LTS’s with- 
out fairness constraints. Fairness constraints, such as weak or strong fairness on 
actions, are sometimes required to rule out undesired computations. Manna and 
Pnueli [MW7I observe that there are two possible ways of handling fairness: one 
can either incorporate the fairness constraints into the property, or incorporate 
them into the proof system. They point out that these approaches are closely 
related. Indeed, the modified proof system corresponds to a particular way of 
proving the modified property using the original proof system. Therefore, we 
prefer to keep the simplicity of the proof system, and incorporate any fairness 
constraints into the property. 



4 Proof Generation and Checking 

The completeness proof in Theorem 0 shows how to generate a proof for a suc- 
cessful model checking attempt. Such proofs can be generated both by explicit- 
state and symbolic model checkers. For symbolic model checkers, the invariant 
assertions are represented by formulas (i.e., BDD’s), and it is desirable also for 
the rank functions to be converted to predicates; i.e., to represent the terms 
{pq = k) and <iq k) as the predicates p={q,k) and pc{qi,q,k), respectively. 
Individual proof steps become validity assertions in the assertion language which, 
for a finite-state model checker, is propositional logic. It is possible for the proof 
generator and the proof checker to use different symbolic representations and dif- 
ferent validity checking methods. For instance, the model checker can be based 
on BDD methods, while the proof checker represents formulas with syntax trees 
and utilizes a SAT solver to check validity. 
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Since we use alternating automata to specify properties, the automaton that 
defines ~^f can be obtained easily by dualizing Af\ exchanging true and false, 
A and V , and (a) and [a] in the transition relation and replacing the parity 
condition F with its negation (fh, Fq, , F 2 n)- A winning strategy for player I 
with the dual automaton is a winning strategy for player II with the original 
automaton. Thus, the set Wn = -FVj can be used to construct a proof of 
relative to the dual automaton. To avoid doing extra work to create a proof 
for on failure, it is desirable to record approximations for both the /r and v 
variables while evaluating W/: if / holds, the approximations for the /r variables 
are used to calculate the rank function; if not, a dual proof can be constructed 
for ->/, using the negations of the approximations recorded for the v variables of 
W/, which are the n variables in W//. This strategy is followed in our prototype 
proof generator for COSPAN. 

Example: To illustrate the proof generation process, consider the following 
program and property. All transitions of the program are labeled with r. 

Program M[m : N) (* circular counter *) 

var c : (0..2™' — 1); initially c = 0; transition c' = (c + 1) mod 2"* 

Property A (* AGF(c = 0), i.e., on all paths, c = 0 holds infinitely often *) 
states = {qo,gi}; initially go; 

transition S{qo, true) = [r]gi, 5{qi,c = 0) = go, <5(gi, c 7 ^ 0) = [r]gi 
parity condition {Fo,Fi), where Fq = {go},Ei = {gi}. 




Fig. 1. The Graph of M x A for m = 2. 



The yVi formula, as defined in Section 1?^ simplifies to the following, since 
every state oi M x A \s & Il-state: Wi = {vZq : {^iZi : <Fi{Zq, Zx)), where 
<1>i{Zq,Zi) = ((go => [t]Zo) a (91 =1^ This formula evaluates to true 

on M X A. Thus, (f)qg and 4>q.g , as calculated in the proof of Theorem El are 
both true (i.e., c S {0..2"* — 1}). The rank function is calculated by computing 
the signatures of states satisfying Wj. As there is a single odd index in the 
parity condition, the signature is a singleton vector, which may be represented 
by a number. By the definition in Section EH the signature of a state satisfying 
yVi is the smallest index i for which the state belongs to {fi'^Zi : <Pj{Wi, Zi)). 
This formula simplifies to (/r*Zi : (go V [t]Zi)), which essentially calculates the 
distance to the go state. 

The italicized number next to each state in FigureHshows its rank. The rank 
functions are, therefore, Pqg{c) = 1 and Pgfyc) = if (c= 0) then 2 else (6 — c). By 
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construction, the Consistency and Initiality properties of the proof are satisfied. 
Instantiating the general proof scheme of Section El the Invariance and Progress 
obligations reduce to the following, all of which can be seen to hold. 

- [ct)qa{c) A true A (pg„(c) = k) => [T](0gi(c))] 

- [^91 (c) A (c = 0) A (P 9 i(c) = k) ^ A (pgo(c) < fc))], and 

- I'Pgiic) A (cyf 0) A (c) = fc) H(<^ 9 i(c) A (pq^ic) < fc))] 

Proofs vs. Counterexamples: A natural question that arises concerns the 
relationship between a proof for and a counterexample computation for /. 
This is elucidated in the theorem below. 

Theorem 4. For a program M and a linear time, co-Biichi V-automaton A, if 
M does not satisfy A, and M x A has m bits and a counterexample of length 
n, it is possible to construct a proof for ^A that needs 2mn bits. On the other 
hand, a proof can be exponentially more succinct than any counterexample. 

Proof. In general, a counterexample consists of a path to an accepting state, 
and a cycle passing through that state. Define the invariants 4>q so that they hold 
only of the states on the counterexample, and let the rank function measure the 
distance along the counterexample to an accepting state. This can be represented 
by BDD’s of size 2mn. 

On the other hand, consider the program in Figure D and the property 
G(c' > c). This is false only at c = 2"* — 1, so the shortest counterexample has 
length 2™ + 1. We can, however, prove failure by defining the invariant to be 
true (really, EF(c' < c)), and by letting state c have rank fciffc + fc = 2™ — 1. 
This rank function measures the distance to the violating transition. It can be 
represented by a BDD of size linear in m by interleaving the bits for c and fc. 
Thus, the proof has size linear in m and is, therefore, exponentially more succinct 
than the counterexample. □ 

5 Applications 

The ability to generate proofs which justify the outcome of model checking makes 
possible several interesting applications for a certifying model checker. 

1. Generating Proofs vs. Generating Gounterexamples: We have shown 
how a certifying model checker can produce a proof of property / upon success 
and a proof for on failure. Both types of proofs offer insight on why the 
property succeeds (or fails) to hold of the program. Inspecting success proofs 
closely may help uncover vacuous justifications or lack of appropriate coverage. 
The generated proof for is a compact representation of all counterexample 
computations. This proof can be “unfolded” interactively along the lines of the 
strategy description in the soundness proof of Theorem 0 This process allows 
the exploration of various counterexamples without having to perform multiple 
model checking runs (cf. Ibb98l l. 

2. Detecting Errors in a Model Ghecker: The proof produced by a cer- 
tifying model checker stands by itself; i.e., it can be checked for correctness 
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independently of the model checker. For instance, the model checker may use 
BDD’s, but the proof can be checked using a SAT solver. It is possible, there- 
fore, to detect errors in the model checkeiQ. For instance, if the model checker 
declares success but produces an erroneous proof, this may be due to a mistake 
in the implementation which results in a part of the state space being overlooked 
during model checking. 

3. Integrating Model Checking with Theorem Proving: Efforts to in- 
tegrate model checking with theorem proving i.i^5mss55i have added such a 
capability at a shallow level, where the result of model checking is accepted as an 
axiom by the theorem prover. This has been addressed in fYl.h71Spr98l , where 
tableau proofs generated using explicit state model checkers are imported into 
theorem provers. Our proof generation procedure allows symbolic proofs, which 
are more compact than explicit state proofs, to be used for the same purpose. 

Theorem proving, in one form or another, has been used to design and verify 
abstractions of infinite state systems (cf. |M NS99) h to prove conditions for sound 
composit ional reaso ning (cf. |McM99j ). and to prove parameterized systems cor- 
rect (cf. |BBC+odj h In the first two cases, model checking is applied to small 
subgoals. Proofs generated by a certifying model checker for these subgoals can 
be composed with the other proofs to produce a single, mechanically checkable, 
proof script. In the last case, the model checker can be used to produce proofs 
about small instances of parameterized systems. The shape of the invariance 
and progress assertions in these proofs can often suggest the assertions needed 
for proving the general case, which is handled entirely with the theorem prover. 
This approach has been applied in IPR7b1lAPR+fn1 to invariance properties. 

4. Proof Carrying Code: A certifying model checker can produce proofs 
for arbitrary temporal properties. These proofs can be used with the “proof- 
carrying-code” paradigm introduced in |NL96j for mobile code: a code producer 
sends code together with a generated correctness proof, which is checked by the 
code consumer. The proof generator in jlY I ^98j is tailored to checking memory 
and type safety. Using a certifying model checker, one can, in principle, gener- 
ate proofs of arbitrary safety and liveness properties, which would be useful for 
mobile protocol code. 

6 Conclusions and Related Work 

There is prior work on automatically generating explicit state proofs for proper- 
ties expressed in the mu-calculus and other logics, but the proof system and the 
algorithm of this paper appear to be the first to do so for symbolic representa- 
tions. In !KiclYL97j . algorithms are given to create tableau proofs in the style 
of |SW89| . In parallel with our work, Peled and Zuck P77TT| have developed an 
algorithm for automatically generating explicit state proofs for LTL properties. 
The game playing algorithm of jSS98j implicitly generates a kind of proof. 

Explicit state proofs are of reasonable size only for programs with small 
state spaces. For larger programs, symbolic representations are to be preferred. 

So a certifying model checker can be used to “certify” itself! 
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as they result in proofs that are more compact. While the tableau proof system 
has been extended to symbolic representations in [HSh‘2] . the extension requires 
an external, global termination proof. In contrast, our proof system embeds the 
termination requirements as locally checkable assertions in the proof. 

The proof system presented here is closely related to those of [Mh*87| (for V- 
automata) and (for fair-CTL), but generalizes both systems. The proof 

system is specifically designed to be locally checkable, so that proofs can be 
checked easily and mechanically. For some applications, it will be necessary to 
add rules such as modus ponens to make the proofs more “human-friendly” . As 
we have discussed, though, there are many possible applications for proofs that 
are generated and checked mechanically, which opens up new and interesting 
areas for the application of model checking techniques. 
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1 Introduction 

The byte-code verifier is advertised as a key component of the security and 
safety strategy for the Java language, making it possible to use and exchange 
Java programs without fearing too much damage due to erroneous programs or 
malignant program providers. As Java is likely to become one of the languages 
used to embed programs in all kinds of appliances or computer-based applica- 
tions, it becomes important to verify that the claim of safety is justified. 

We worked on a type system proposed in [[] to enforce a discipline for object 
initialization in the Java Virtual Machine Language and implemented it in the 
Coq 0 proof and specification language. We first produced mechanically checked 
proofs of the theorems in [71 and then we constructed a functional implemen- 
tation of a byte-code verifier. We have a mechanical proof that this byte-code 
verifier only accepts programs that have a safe behavior with respect to initial- 
ization. Thanks to the extraction mechanism provided in Coq ca. we obtain a 
program in CAML that can be directly executed on sample programs. 

A safe behavior with respect to initialization means that the fields of any 
object cannot be accessed before this object initialized. To represent this, the 
authors of [7] distinguish between uninitialized objects, created by a new instruc- 
tion and initialized objects. Initialization is represented by an init instruction 
that replaces an uninitialized object with a new initialized object. Access to fields 
is represented abstractly by a use instruction, which operates only if the operand 
is an initialized object. Checking that initialization is properly respected means 
checking that use is never called with the main operand being an unitialized 
object. 

There are two parts in this work. The first part simply consists in the me- 
chanical verifications of the claims appearing in [ZJ. This relies on a comparison 
between operational semantics rules and typing rules. In terms of manpower in- 
volved, this only required around three weeks of work. This shows that proof 
tools are now powerful enough to be used to provide mechanical verifications of 
theoretical ideas in programming language semantics (especially when semantic 
descriptions are given as sets of inference rules). 

The second part consists in producing a program, not described in that 
satisfies the requirements described there. To develop this program, we have 
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analyzed the various constraints that should be satisfied for each instruction 
and how these constraints could be implemented using a unification algorithm. 
In all, the experiments that the proof tool can also be used as a programming 
tool, with the advantage that logical reasoning can be performed on program 
units even before they are integrated in a completely functioning context. 

This paper is a short abstract of a paper published as an INRIA research 
report under the title A Coq formalization of a Type Checker for Object Initial- 
ization in the Java Virtual Machine P]. 



1.1 Related Work 



Several teams around the world have been working on verifying formally that the 
properties of the Java language and its implementation suite make it a reasonably 
safe language. Some of the work done is based on pen-and-paper proofs that the 
principles of the language are correct, see for instance 

Closer to our concerns are the teams that use mechanical tools to verify the 
properties established about the formal descriptions of the language. A very ac- 
tive team in this field is the Bali team at University of Munich who is working 
on a comprehensive study of the Java language, its properties and its implemen- 
tation fl5li:ill9] using the Isabelle proof system m- Other work has been done 
with the formal method B and the associated tools P) , at Kestrel Institute using 
Specware |XI2()| . or in Nijmegen |9ll()f | using both PVS and Isabelle. 



2 Formalizing the Language and Type System 

2.1 Data- Types 

The formalization we studied is based on a very abstract and simplified descrip- 
tion of the Java Virtual Machine language. The various data-types manipulated 
in the programs are represented by abstract sets: ADDR for addresses in programs, 
VAR for variable names, integer for numeral values, T for classes. 

The type of classes being left abstract, the way objects of a given class are 
constructed is also left abstract. We will actually rely on the minimal assump- 
tions that there is a family of types representing the values in each class, such a 
family is represented by a function from T to the type of data-types, written in 
Coq as the following parameter to the specification. 

Paremieter object_value :T -> Set. 

Since the objective of this study is initialization, there is a distinct family of sets 
for uninitialized object of a given class: 

Parcuneter uninitialized_value : T -> Set. 

We also assume the existence of test equality functions for uninitialized objects. 

With all this, we express that the set of values manipulated by the abstract 
Java virtual machine is the disjoint sum of the set of integers, the set of object 
values and the set of uninitialized object values, with the following definition: 
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Inductive value; Set := 

int_val : integer ->value 
I obj : (t:T) (object_value t) ->value 
I un: (t;T) (a:ADDR) (uninitialized_value t) ->value. 

This inductive definition shows that a value can be in one of three cases, rep- 
resented by three constructors. The first constructor, int_val expresses that a 
value can be an integer, the second constructor, obj expresses that a value can 
be an initialized object in some class T, the third constructor, un, expresses that 
a value can be an unitialized object for some class T, and that this value is also 
tagged with an address. Note that this definition uses a feature called dependent 
types: viewed as a function, the constructor obj takes a first argument t in T 
and a second argument whose type depends on the first one: this type must be 
object_value T. 

The formal description of the Java Virtual Machine language as used in jjj 
boils down to a 10 constructor inductive type in the same manner. We named 
this type jvmli. 

2.2 Operational Semantics 

In JZ], the operational semantics are given as a set of inference rules which 
describe the constraints that must hold between input and output data for each 
possible instruction. We handle judgments of the form 

p I- {pcj,s) {pc\f,s'). 

These judgments must be read as for program P, one step of execution starting 
from the program eounter pc, variable value description f , and stack s returns 
the new program counter pc' , the new variable value description f , and stack s'. 
Stacks are simply represented as finite lists of objects of type value, to represent 
memory, we use functions written /, /', from variable names (type VAR) to values 
(type value). 

For instance, the language has an instruction load that fetches a value in 
memory and places it on the stack. This is expressed with this inference rule: 

P[pc] = load X 

P ^ {pc, f, s) {pc + 1, /, f[x] ■ s) 

Special attention must be paid to the way initialization works. Initializing 
an object means performing a side-effect on this object and all references to the 
object should view this side-effect. Thus, if several references of the same object 
have been copied in the memory then all these references should perceive the 
effect of initialization. 

This is expressed with two rules. First the new instruction always creates an 
object that is different from all objects already present in memory (in this rule 
A is a short notation for (uninitialized_value cr)) . 

P[pc] — new a a G Unused{a, /, s) 
p I- {pc, f, s) {pc +l,f,a- s) 
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Second, all occurrences (i.e., references) of an object in memory are modified 
when it is initialized {A'^ is a short notation for object_value cr, [a'/a]f is the 
function that maps x to f{x) if f{x) ^ a and to a' if f{x) = a', and [a' /a]s is 
the same stack as s except that instances of a have been replaced with a'). 

P[pc\ = init a a S a' G A'^ Unused{a' , f, s) 

P h {pcj,a- s) {pc+1, [a'/a]f, [a' /a]s) 

All these rules are easily expressed as constructors for inductive propositions, 
that are a common features in many modern proof tools. 



2.3 Type System 

In 0 the authors propose a set of typing rules for jvmli programs. This type 
system is based on the existence of a representation of all types for the stack 
and the variables at all lines in the program. It handles judgments of the form 

F, S', f h P 

meaning the type information for variables in F and for stacks in S is consis- 
tent with line i of program P. The variable F actually represents a function over 
addresses, such that F{i) is a function over variable names, associating vari- 
able names to types (we will write Fi instead of F{i))- Similarly S represents 
a function over addresses, where Si is a stack of types. Consistency between 
type information with the program expresses that the relations between types 
of memory locations correspond to the actual instruction found at that address 
in the program. It also involves relations between types at line i at types at all 
lines where the control will be transfered after execution of this instruction. 

For instance, the typing rule for load expresses that the type information at 
the line i -I- 1 must indicate that some data has been added on the stack when 
compared with the type information at line i. 

P[i] = load X 

Fi+i = Fi 
S^+l = Fi{x) ■ Si 
i-\-l & Dom{P) 

f\sJTp 

For new and init there are a few details that change with respect to the 
operational semantics. While the operational semantics required that the object 
added on top of the stack by new should be unused in the memory, the type 
system also requires that the type corresponding to this data should be unused. 
For init, the operational semantics requires that the new initialized value should 
be unused but this premise has no counterpart in the typing rule. Still the typing 
rule for init requires that all instances of the uninitialized type found on top 
of the stack should be replaced by an initialized type. In these rules we use cr to 
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denote the type of initialized objects and Ui to denote the types of uninitialized 
objects created at address i. 



P[i\ = new cr 
-Fj+i = Fi 

= CTi ■ Si 

(Tj ^ Si 
\/x.F^{x) ^ 

F, 5, t h P 



P[i\ = init a 

Fi+i = [a/aj]Fi 
Si — fJj * OL 
Sj+I = [cr/(Tj]a 

F, 5, i h P 



A singularity of this description is that all inference rules have the same conclu- 
sion, so that the proof procedures that usually handle these operational seman- 
tics descriptions 0 failed to be useful in this study. 



3 Consistency of the Type System 

The main theorem in [Z1 is a soundness theorem, saying that once we have proved 
that a program is well- typed this program will behave in a sound manner. Here 
this decomposes in a one-step soundness theorem: if a program P is well-typed 
at address i with respect to type information given by F and S', then executing 
this instruction from a state that is consistent with F and S at address i should 
return a new state that is also consistent with F and S at the address given by 
the new program counter. 

This proof of soundness is pretty easy to perform, since the operational se- 
mantics rule and the typing rule are so close. However special attention must be 
paid to the problem of initialization because of the use of substitution. At the 
operational level, initialization works by substituting all instances of the unini- 
tialized object with an initialized instance. At the type-system level, the same 
operation is performed, but how are we going to ensure that exactly the same 
location will be modified in both substitutions? 

The solution to this problem is introduced in [7j under the form of a predicate 
Consistentinit which basically expresses that whenever two locations have the 
same uninitialized type, then these locations contain the same value. In other 
terms, although there may be several values with the same uninitialized type, 
we can reason as if there was only one, because two different values will never 
occur at the same time in memory. This ensures that the substitutions in the 
operational rule for init and in the typing rule for init modify the memory in 
a consistent way. 

The theorem of soundness is then expressed not only in terms of type con- 
sistency between the state and the type information at address i, but also 
in terms of the Consistentinit property. We also have to prove that this 
Consistentinit property is invariant through the execution of all instructions. 
Proving this invariant represents a large part of the extra work imposed by ini- 
tialization. A more detailed presentation of the proof is given in the extended 
version of this paper |2| . We also proved a progress theorem that expresses that if 
the state is coherent with some type information and the instruction at address 
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i is not a halt instruction then execution can progress at address i. The two 
theorems can be used to express that program execution for well-typed programs 
progresses without errors until it reaches a halt statement. 

4 Constructing an Effective Verifier 

The type system does not correspond to an algorithm, since it assumes that 
the values F and S have been provided. An effective byte-code verifier has to 
construct this information. According to the approach advocated in [[j, one 
should first produce this data, possibly using unsafe techniques, and then use the 
type verification described in the previous section to verify that the program is 
well- typed according to that information. The approach we study in this section 
is different: we attempt to construct F and S is such a way that the program 
is sure to be well- typed if the construction succeeds: it is no longer necessary to 
check the program after the construction. 

In [12], T. Nipkow advocates the construction of the type information as the 
computation of a fix-point using Kildall’s algorithm HH. We have used a similar 
technique, based on traversing the control flow graph of the program and finding 
a least upper bound in a lattice. The lattice structure we have used is the lattice 
structure that underlies unification algorithms and we have, in fact, re-used a 
unification algorithm package that was already provided in the user libraries of 
the proof system m- However, the general approach of Nipkow was not followed 
faithfully, because the constraints we need to ensure are not completely stable 
with respect to the order used as a basis for unification. As a result, we still need 
to perform a verification pass after the data has been constructed. 



4.1 Decomposing Typing Rules into Constraints 

The typing constraints imposed for each instruction can be decomposed into 
more primitive constraints. We have isolated 8 such kinds of constraints. To ex- 
plain the semantics of these constraints, we have a concept of typing states, with 
an order between typing states, Typing states are usually denoted with vari- 
ables of the form t, t' . We have a function add_constraint ’ to add constraints. 

1. (tc_all_vars i j). This one expresses that the types of variables at lines i 
and j have to be the same. 

2. (tc_stack i j). This one expresses that the types in stacks at lines i and j 
have to be the same. 

3. (tc_top i t). This one expresses that the type on top of the stack at line i 
has to be the type r. 

4. (tc_pop i j). This one expresses that the stack at line j has one less element 
than the stack at line i 

5. (tc_push i j x). This one expresses that the stack at line j is the same as 
the stack at line i where a type has been added, this type being the type of 
variable x at line i. 
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6. (tc_push_type i j r). This one is like the previous one except that the type 
is given in the constraint. 

7. (tc_store i x j). This one expresses that the variables at line j have the 
same type as at line i, except for the variable x, which receives at line j the 
type that is on top of the stack at line i, also the stack at line j is the same 
as the stack at line i with the top element removed. 

8. (tc_init i j a). This one expresses most of the specific constraints that 
are required for the instruction init. Its semantics is more complicated to 
describe and it is actually expressed with three properties. The first property 
expresses that the stack must have an uninitialized type on top: 

tc_init_stack_exists: 

(add_constraint' (tc_init i j a) t) = (Some t) 

(stack_def ined t i) ^ 

3k, a. St{i) = CTfc • a 

The second property expresses that all variables that referred to that unini- 
tialized type at line i must be updated with a new initialized type at line j 
(we define a function subst on functions from VAR to types to represent the 
substitution operation) . 

tc_init_f rame: 

(add_constraint' (tc_init i j a) t) = (Some t) 

St{i) = dk- a=> 

FtU) = {subst Ft{i) Uk (j) 

The third property expresses the same thing for stacks (we also have a func- 
tion subst-stk to represent the substitution operation on stack). 

tc_init_stack: 

yi,j,k,t,a,a. 

(add_constraint' (tc_init i j a) t) = (Some t) 

St{i) = dk ■ 

St{j) = {subst-stk a dk d) 

In these statements, the predicate (stack_def ined t i) expresses that even 
though the state t may be incomplete, it is necessary that it already contains 
enough information to know the height of the stack at line i. 

The constraints for each instruction are expressed by composing several primitive 
constraints. For instance, the constraints for (load x) at line i are the following 
ones: 

(tc_all_vars i {i + 1)) (tc_push i (i -|- 1) x) 

A special case is the instruction (new cr), which creates an uninitialized value, 
i.e., a value of type di if we are at address i. We associate to this instruction the 
following constraints: 



(tc_all_vars i {i + 1)) (tc_push_type i (z -|- 1) di) 
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These constraints do not express the requirement that the type of the uninitial- 
ized object, Ui, must not already be present in the memory at line i (this was 
expressed by the predicate Unused in the typing rule). 

We use an order ^ between typing information states, such that t <t' means 
that t' contains strictly more information about types than t. In fact, this order 
is simply the instantiation order as used in unification theory. All constraints, 
except the constraint tc_init are preserved through The requirements for 
initialization and for creating a new unitialized object are not preserved, this 
explains why we have to depart from Kildall’s algorithm. 



4.2 Relying on Unification 

We use unifiable terms to represent successive states of the verifier and unifiable 
terms to represent constraints. Applying a constraint c to a type state t is im- 
plemented as applying the most general unifier of c and t to t to obtain a new 
state t' . Let us call cumulative constraints constraints of the kinds 1 to 7 in the 
enumeration above. These are the constraints that are stable with respect to the 
order 

The fragments F and S of the typing state actually are bi-dimensional arrays. 
For F the lines of the arrays are indexed with addresses, while the columns 
correspond to variables. For S the lines are also indexed with addresses, and each 
line gives the type of the stack at the corresponding address. When representing 
these notions as unifiable terms, they can be encoded as lists of lists. 

The unifiable terms are composed of variables and terms of the form 

fi (tl : ■ • ■ ftk) 

for a certain number of function operators fi. These operators are as follows: 

— fcons and fnil are the constructors for lists. 

— fint is the operator for the type of integer values, (otype a) is used for 
types of initialized objects of class a, and (utype cr i) for types of unitialized 
objects created at address i. 

The initial typing state is the pair of a variable, to express that nothing is known 
about F in the beginning and a term of the form f cons(fnil, X) to express that 
we know that the stack at line 0 is the empty stack and that we do not know 
anything about the stack on other lines yet. 

The unifiable terms corresponding to cumulative constraints are easily ex- 
pressed as iterations of the basic function operators. To make this practical we 
define a few functions to represent these iterations. For instance, we construct 
the term 

fcons(Afc+i, . . . f cons(Afe+j_i(fcons(t, Afe+j)) . . .) 

by a calling a function (place_one_list t j (k+ 1)). The third argument, k+1, 
is used to shift the indices of the variables occurring at places 1, . . . , j — 1 in 
the list. This term represents a list where the element is constrained by t 
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and all other elements are left unconstrained (even the length of the list is not 
constrained much, it only needs to be greater than j). 

Similarly, (mk_two_list ti t 2 gap i k) will construct the list whose length 
has to be greater than i + gap and whose elements at ranks i and i + gap are 
constrained by ti and t 2 respectively (here again the last argument, k is used to 
shift the indices of all the extra variables inserted in the list) . 

We do not describe the encoding of all constraints, but we can already express 
the encoding of the constraint (tc_push i j x) (when j > i): 

[(tc_push i j x)^ = 

(place_one_list (place_one_list Xk x k + 2) i (k + x + 2)), 
(mk_two_list Xk+i (icons X^ X^+i) {j — i) i {k + x + i + 2)) 

Proving that the constraints are faithfully represented by the unifiable terms 
we associate to them requires that we show how functions like place_one_list 
behave with respect to some interpretation functions, mostly based on some form 
of nth function to return the element of a list. For instance, if F,S represent 
the typing state, knowing the type of variable x at line i simply requires that we 
compute the unifiable term given by (nth (nth F i) x). 

4.3 A Two Pass Algorithm 

The algorithm performs a first pass where all cumulative constraints are applied 
to the initial typing state to obtain preliminary information about all types of 
variables and stacks at all lines. The constraint tc_init for initializations is 
also applied, even though we know that it will be necessary to re-check that the 
constraint is still satisfied for the final state. 

The second pass does not modify the typing state anymore. It simply verifies 
that the final typing state does satisfy the restrictive constraint imposed by 
instructions new and init. For (new cr) at line t, it means verifying that the type 
(Ti occurs nowhere in the variables or the stack at line i. For (init cr) it means 
checking again that the unifiable term [(tc_init i (t -F 1) cr)] unifies with the 
final state. 

5 Conclusion 

The extraction mechanism of Coq makes it possible to derive from this proof 
development a program that will run on simple examples. This program is very 
likely to be unpractical: no attention has been paid to the inherent complexity 
of the verification mechanism. At every iteration we construct terms whose size 
is proportional to the line number being verified: in this sense the algorithm 
complexity is already sure to be more than quadratic. 

Still, even if the exact representation of the typing state and constraints are 
likely to change to obtain a more usable verifier, we believe that the decompo- 
sition of its implementation and certification in the various phases presented in 
this paper is likely to remain relevant. These phases are: 
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1. Proving the soundness of a type system that uses data not in the program, 

2. Proving that a program can build the missing data and ensure the typing 
constraints, 

3. Setting aside the constraints that may not be preserved through the refine- 
ments occurring each time a line is processed, 

4. Traverse the program according to its control flow graph. 

With a broader perspective, this development of a certified byte-code verifier 
shows that very recent investigations into the semantics of programming lan- 
guages can be completely mechanized using modern mechanical proof tools. The 
work presented here took only two months to mechanize completely and the part 
of this work that consisted in mechanizing the results found in 0 took between 
one and two weeks. This is also an example of using a type-theory based proof 
system as a programming language in the domain of program analysis tools, with 
all the benefits of the expressive type system to facilitate low-error programming 
and re-use of other programs and data-structures, as we did with the unification 
algorithm of Future development on this work will lead to more efficient, 
but still certified, implementations of this algorithm and and integration in a 
more complete implementation such as the one provided in 
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Abstract. A parameterized concurrent system represents an infinite 
family (of finite state systems) parameterized by a recursively defined 
type such as chains, trees. It is therefore natural to verify parameterized 
systems by inducting over this type. We employ a program transforma- 
tion based proof methodology to automate such induction proofs. Our 
proof technique is geared to automate nested induction proofs which do 
not involve strengthening of induction hypothesis. Based on this tech- 
nique, we have designed and implemented a prover for parameterized 
protocols. The prover has been used to automatically verify safety prop- 
erties of parameterized cache coherence protocols, including broadcast 
protocols and protocols with global conditions. Furthermore we also de- 
scribe its successful use in verifying mutual exclusion in the Java Meta- 
Locking Algorithm, developed recently by Sun Microsystems for ensuring 
secure access of Java objects by an arbitrary number of Java threads. 



1 Introduction 



There is a growing interest in verification of parameterized concurrent systems 
since they occur widely in computing e.g. in distributed algorithms. Intuitively, 
a parameterized system is an infinite family of finite state systems parameterized 
by a recursively defined type e.g. chains, trees. Verification of distributed algo- 
rithms (with arbitrary number of constituent processes) can be naturally cast as 
verifying parameterized systems. For example, consider a distributed algorithm 
where n users share a resource and follow some protocol to ensure mutually 
exclusive access. Model checking 1612 11241 can verify mutual exclusion for only 
finite instances of the algorithm, i.e. for n = 3, n = 4, . . . but not for any n. 

In general, automated verification of parameterized systems has been shown 
to be undecidable |2|. Thus, verification of parameterized networks is often ac- 
complished via theorem proving !i4ii7ri'ij , or by synthesizing network invariants 



* This work was partially supported by NSF grants CCR-9711386, CCR-9876242 and 
EIA-9705998. The first author was a Ph.D. student at SUNY Stony Brook during 
part of this work. 



G. Berry, H. Comon, and A. Finkel (Eds.): CAV 2001, LNCS 2102, pp. 25-|iXJ 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 




26 



Abhik Roychoudhury and I.V. Ramakrishnan 






Alternatively, one can identify subclasses of parameterized systems 
for which verification is decidable Another approach 



finitely represents the state space of a parameterized system and applies (sym- 
bolic) model checking over this finite representation. 

Since a parameterized system represents an infinite family parameterized 
by a recursively defined type, it is natural to prove properties of parameter- 
ized systems by inducting over this type. In a recent paper m we outlined a 
methodology for constructing such proofs by suitably extending the resolution 
based evaluation mechanism of logic programs. In our approach, the parameter- 
ized system and the property to be verified are encoded as a logic program. The 
verification problem is reduced to the problem of determining the equivalence of 
predicates in this program. The predicate equivalences are then established by 
transforming the predicates. The proof of semantic equivalence of two predicates 
proceeds automatically by a routine induction on the structure of their trans- 
formed definitions. One of our transformations (unfolding) represents resolution 
and performs on-the-fly model checking. The others {e.g. folding) represent de- 
ductive reasoning. The application of these transformations are arbitrarily in- 
terleaved in the verification proof of a parameterized system. This allows our 
framework to tightly integrate algorithmic and deductive verification. 



Summary of Contributions. In this paper, we employ our logic program 
transformation based approach for inductive verification of real-life parameter- 
ized protocols. The specific contributions are: 

1. We construct an automatic and programmable first order logic based prover 
with limited deductive capability. The prover can also exploit knowledge of 
network topology (chain, tree etc) to facilitate convergence of proofs. 

2. Our program transformation based technique produces induction proofs. We 
clarify the connection between our transformations and inductive reasoning. 

3. Our technique is not restricted to specific network topologies. We have ver- 
ified chain, ring, tree, star and complete graph networks. Furthermore by 
enriching the underlying language to Constraint Logic Programming (CLP), 
the technique can be extended to verify infinite families of infinite state 
systems such as parameterized real-time systems. 

4. Besides verifying parameterized cache coherence protocols such as Berkeley 
RISC and Illinois, we also report the verification of mutual exclusion in 
Java meta-locking algorithm. It is a real-life distributed algorithm recently 
developed by Sun Microsystems to ensure mutual exclusion in accessing Java 
objects by an arbitrary number of Java threads. Previously, the designers of 
the protocol gave an informal correctness argument Q, and model checking 
of instances of the protocol were done P] . This is the first machine generated 
proof of the algorithm which is parameterized by the number of threads. 

The rest of the paper is organized as follows. Section 0 presents an overview 
of our program transformation based proof technique for parameterized systems 
presented in |25| . Section 0 clarifies the connection between program transfor- 
mations and inductive reasoning. Section E] discusses the functioning of our au- 
tomated prover for parameterized protocols. Section 0 presents the successful 
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nat (0) . 

nat(s(X)) nat(X). 
trans(s(X) , X) . 

p(0) . 

System Description 



efp(S) p(S) . 

efp(S) transCS, T) , efp(T) . 

thin(X) nat(X), efp(X) . 

Property Description 



Fig. 1. Proving Liveness in Infinite Chain. 



use of our prover in verifying parameterized cache coherence protocols as well 
as the Java meta- locking algorithm. Finally, Section El concludes the paper with 
related work and possible directions for future research. 

2 Overview 

In this section, we recapitulate our core technique for inductive verification izni 
through a very simple example. Let us consider an unbounded length chain whose 
states are numbered n, n — 1, . . ., 0. Further suppose that the start state is n, the 
end state is 0 and a proposition p is true in state 0. Suppose we want to prove 
the CTL property EF p for every state in the chain. Alternatively, we can view 
this chain as an infinite family of finite chains of length 0,1,2,. . . and the proof 
obligation as proving EF p for every start state of the infinite family. Either 
way, our proof obligation amounts to Vu S N n ^ EF p. Our proof technique 
dispenses this obligation by an induction on n. 




Encoding the Problem. In the above example, the states are captured by 
natural numbers which we represent by a logic program predicate nat (refer 
Figure n the term s(K) denotes the number K+1). The transition relation is 
captured by a binary predicate trans s.t. transCS, T) is true iff there exists 
a transition from state S to state T. 0 The temporal property EF p is encoded 
as a unary predicate efp s.t. for any state S, efp(S) S ^ EF p. The first 
clause of efp succeeds for states in which proposition p holds. The second clause 
of efp checks if a state satisfying p is reachable after a finite sequence of tran- 
sitions. Thus Vn S N n 1= EFp iff VX nat(X) ^ efp(X). Moreover this holds if 
VX thm(X) nat(X) in Pq. 

Proof by Program Transformations. We perform inductive verification via 
logic program transformations using the following steps. A detailed technical 
presentation of this proof technique appears in 

1. Encode the system and property description as a logic program Pq- 

^ For realistic parameterized systems, the global transition relation is encoded recur- 
sively in terms of local transition relations of constituent processes; see section 0 
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2. Convert the verification proof obligation to predicate equivalence proof obli- 

gation(s) of the form Pq 1“ P = q (p, q are predicates) 

3. Construct a transformation sequence Pq, P\, . . . , Pk s.t. 

(a) Semantics of Pq = Semantics of Pk 

(b) from the syntax of Pk we infer h p = q 

For our running example, the logic program encoding Pq appears in Figure ^ 
We have reduced the verification proof obligation to showing the predicate equiv- 
alence Pq F thm = nat. We then transform program Pq to obtain a program Pk 
where thm and nat are defined as follows. 

thm(O) . nat (0) . 

thm(s(X)) thm(X) . nat(s(X)) nat(X). 

Thus, since the transformed definitions of thm and nat are “isomorphic”, 
their semantic equivalence can be inferred from syntax. In general, we have a 
sufficient condition called syntactic equivalence which is checkable in polynomial 
time w.r.t. program size (refer P5l?7j for a formal definition) . 

Note that inferring the semantic equivalence of thm and nat based on the 
syntax of their transformed definitions in program Pk proceeds by induction on 
the “structure” of their definitions in Pk (which in this example amounts to an 
induction on the length of the chain). The program transformations employed 
in constructing the sequence Pq , Pi , . . . , Pfc correspond to different parts of this 
induction proof. In the next section, we will clarify this connection between 
program transformations and inductive reasoning. 



3 Program Transformations for Inductive Verification 

Unfold/Fold Program Transformations. We transform a logic program 
to another logic program by applying transformations that include unfolding 
and folding. A simple illustration of these transformations appears in Figure 0 
Program P{ is obtained from Pq by unfolding the occurrence of q(X) in the 
definition of p. P 2 is obtained by folding q(X) in the second clause of p in P{ 
using the definition of p in Pg (an earlier program). Intuitively, unfolding is a 
step of clause resolution whereas folding replaces instance of clause bodies (in 
some earlier program in the transformation sequence) with its head. A formal 
definition of the unfold/fold transformation rules, along with a proof of semantics 
preservation of any interleaved application of the rules, appears in m- 

An Example of Inductive Verification. We now apply these transforma- 
tions to the definition of thm in the program Pq shown in Figure Q First we 
unfold nat(X) in the definition clause of thm to obtain the following clauses. 
This unfolding step corresponds to uncovering the schema on which we induct, 
i.e. the schema of natural numbers. 

thm(O) : - efp(O) . 

thm(s(X)) nat(X), efp(s(X)). 
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p(X) 
q(0) . 
q(s(X)) 



q(X) 



q(X) . 
Program Pq 



q(X) 



p(0) . 

Unf. p(s(X)) 

' q(0). 

q(s(X)) q(X). 

Program P[ 



p(0) . 

Fold p(s(X)) p(X) . 

q(0) . 

q(s(X)) q(X). 

Program P2 



Fig. 2. Illustration of Unfold/Fold Transformations. 



We now repeatedly unfold efp(O). These steps correspond to showing the base 
case of our induction proof. Note that showing the truth of efp(O) is a finite 
state verification problem, and the unfolding steps employed to establish this 
exactly correspond to on-the-fly model checking. We obtain: 

thm(O) . 

thm(s(X)) nat(X), efp(s(X)). 

We repeatedly unfold efp(s(X)) in the second clause of thm. These steps cor- 
respond to finite part of the induction step, i.e. the reasoning that allows us to 
infer n -I- 1 |= EF p provided the induction hypothesis n |= EF p holds. We get 

thm(O) . 

thm(s(X)) nat(X), efp(X). 

Finally, we fold the body of the second clause of thm above using the original 
definition of thm in Pq. Application of this folding step enables us to recognize 
the induction hypothesis (thm(X) in this case) in the induction proof. 

thm(O) . 

thm(s(X)) thm(X). 

The semantic equivalence of thm and nat can now be shown from their syntax 
(by a routine induction on the structure of their definitions) . This completes the 
verification (by induction on nat). 



What Kind of Induction? Since unfolding represents a resolution step, it can 
be used to prove the base case and the finite part of the induction step. However, 
folding recognizes the occurrence of clauses of a predicate p in an earlier program 
PjU i)t within the current program Pi. Thus, folding is not the reverse of 
unfolding. It can be used to remember the induction hypothesis and recognize 
its occurrence. Application of unfold/fold transformations constructs induction 
proofs which proceed without strengthening of hypothesis. This is because the 
folding rule only recognizes instances of an earlier definition of a predicate, and 
does not apply any generalization. In the next section, we will discuss how our 
transformation based proof technique can support nested induction proofs. 
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4 An Automated Prover for Parameterized Protocols 

The inductive reasoning accomplished by our transformation based proof tech- 
nique has been exploited to build an automated prover for parameterized pro- 
tocols. Note that our program transformation based technique for proving pred- 
icate equivalences can be readily extended to prove predicate implication proof 
obligations of the form Pq h p ^ q0 

Since our transformations operate on definite logic programs (logic programs 
without negation), we only verify temporal properties with either the least or the 
greatest fixed point operator. For the rest of the paper, we restrict our attention 
to only proof of invariants. 



4.1 System and Property Specification 



To use our prover, first the initial states and the transition relation of the param- 
eterized system are specified as two logic program predicates gen and trans. The 
global states of the parameterized system are represented by unbounded terms, 
and gen, trans are predicates over these terms. The recursive structure of gen 
and trans depends on the topology of the parameterized network being verified. 
For example, consider a network of similar processes where any process may 
perform an autonomous action or communicate with any other process. We can 
model the global state of this parameterized network as an unbounded list of the 
local states of the individual processes. The transition relation trans can then 
be defined over these global states as follows: 



trans ([HIT], [HllTl]) 

trans ([HIT], [HllTl]) 

trans ([HIT], [H1|T]) 
trans ([HIT], [H|T1]) 



ltrans(H, in(Act) , HI), 
trans_i:est(T, out(Act), Tl) . 
ltrans(H, out(Act), HI), 
trans_i:est(T, in(Act) , Tl) . 
ltrans(H, self (Act), HI), 
trans (T, Tl) . 



trans_rest ( [S I T] , A, [S1|T]) :-ltrans(S, A, SI). 
trans_rest ( [H I T] , A, [H|T1]) : - trans_rest (T, A, Tl) . 



Thus, each process can perform an autonomous action (denoted in the above as 
self (A)) or an input/output action (denoted as in (A) /out (A)) where matching 
input and output actions synchronize. The predicate Itrans encodes the local 
transition relation of each process. For the global transition relation trans, the 
last clause recursively searches the global state representation until one of the 
first three rules can be applied. The third clause allows any process to make 
an autonomous action. The first and second clauses correspond to the scenario 
where any two processes communicate with each other. In particular, the first 
(second) clause of trans allows a process to make an in (A) (out (A)) action and 

^ The proof obligation Pq b P => q formally means: for all ground substitutions 
9 we have p(X)0 £ M{Po) q(X)0 £ M{Po) where M{Po) is the set of ground 
atoms which are logical consequences of the first-order formulae represented by logic 
program Pq. 
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invokes treinsjrest to recursively search for another process which makes the 
corresponding out (A) (in(A)) action. 

A safety property, denoted in CTL as AG ^bad can be verified by proving 
transition invariance. We prove that (1) a bad state is reachable only from a bad 
state, and (2) none of the initial states satisfying gen are bad. This is shown by 
establishing (1) bad_dest ^ bad_src, and (2) bad_start false where the 
predicates bad_dest, bad_src and bad_start are defined as: 

bad_dest(S, T) transCS, T) , bad(T) . 
bad_src(S, T) trans(S, T) , bad(S) . 
bad_start(S) gen(X) , bad(X) . 



4.2 Controlling the Proof Search 

A skeleton of the proof search conducted by our prover is given below. Given a 
predicate implication Pq 1“ P q the prover proceeds as follows. 

1. Repeatedly unfold the clauses of p and q according to an unfolding strategy 
which is is designed to guarantee termination. 

2. Apply folding steps to the unfolded clauses of p, q. 

3. (a) Compare the transformed definitions of p and q to compute a finite set 
{(pi, qi), . . . , (Pk, qk)} S.t. proving Ai<i<k A) h pi ^ qi completes the proof 
of Po h p q (j.e. p q can then be shown via our syntactic check). 

(b) Prove Pq h pi =A qi, . . . , Pq h Pk => qk via program transformations. 

Since the proof of each predicate implication proceeds by induction (on the struc- 
ture of their definition), nesting of the proof obligations Pq h pi ^ qi, . . . , Pq b 
Pk ^ qk within the proof of Pq b p ^ q corresponds to nesting of the corre- 
sponding induction proofs. Note that for the example in FigureGl steps (1) and 
(2) were sufficient to complete the proof and therefore step (3) did not result in 
any nested proof obligations. 

The above proof search skeleton forms the core of our automated prover 
which has been implemented on top of the XSB logic programming system E3- 
Note that the proof search skeleton is nondeterministic i.e. several unfolds or 
several folds may be applicable at some step. For space considerations we omit 
a full discussion on how a transformation step is selected among all applica- 
ble transformations The interested reader is referred m (Chapter 6) for a de- 
tailed discussion, including a description of how the unfolding strategy guaran- 
tees termination. However, note that the prover allows the user to provide some 
problem-specific information at the beginning of the proof, namely (i) Network 
topology (linear, tree etc.) of the parameterized system, (ii) which predicates in 
the program encode the safety property being verified. This user guidance en- 
ables the prover to select the transformation steps in the proof attempt (which 
then proceeds without any user interaction). Below we illustrate illustrate how 
the user-provided information guides the prover’s proof search. 



Network Topology. The communication pattern between the different con- 
stituent processes of a parameterized network is called its network topology. To 
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illustrate the role of network topology in our proof search let us suppose that we 
are proving bad_dest=> bad_src (refer Section l4.1ji . In the proof of bad_dest=> 
bad_src, we first unfold and fold the clauses of bad_dest and bad_src. The 
prover then compares these transformed clauses and detects new predicate im- 
plications to be proved. In this final step, the prover exploits the knowledge of 
the network topology to choose the new predicate implications. For example, 
suppose the parameterized family being verified is a binary tree network whose 
left and right subtrees do not communicate directly. Let the clauses of bad_dest 
and bad_src after unfolding and folding be: 

bad_dest (f (rootl ,L1 ,R1) , f (root2 ,L2,R2) ) p(Ll,L2), q(Rl,R2). 

bad_src(f (rootl,Ll,Rl) , f (root2 ,L2,R2) ) :-p'(Ll,L2), q'(Rl,R2). 

then by default p A q A q' needs to be proved to establish bad_dest => 

bad_src. Instead, the prover recognizes that p, p' (q, q') are predicates defined 
over left (right) subtrees. Thus it partitions the proof obligation p A q ^ p' A q' 
into two separate obligations defined over the left and right subtrees (whose 
transitions are independent of each other): p p^ and q ^ qb In other words, 
knowledge of transition system is exploited by the prover to choose nested proof 
obligations (as a heuristic for faster convergence of the proof attempt). 



Predicates Encoding Temporal Property. By knowing which program 
predicates encode the safety property, the prover avoids unfolding steps which 
may disable deductive steps leading to a proof. To see how, note that the logic 
program encoding of a verification problem for parameterized systems inherently 
has a producer- consumei’' nature. For example to prove transition invariance, 
we need to show bad_dest ^ bad_src (refer Section EH) where bad_dest(S, 
T) :- trainsCS, T) , bad (T) . The system description predicate (trams) is the 
producer, since by unfolding it produces instantiations for variable T. Suppose 
by unfolding transCS, T) we instantiate variable T to a term t representing 
global states of the parameterized family. Now, by unfolding bad(t) we intend 
to test whether bad holds in states represented by t. In other words, the property 
description predicate is a consumer. Unfolding of bad(t) should consume the in- 
stantiation t, rather than producing further instantiation via unification. Hence 
our prover incorporates heuristics to prevent unfoldings of property description 
predicates which result in instantiation of variables. Such unfolding steps can dis- 
able deductive steps converging to a proof e.g. folding of conjunction of trans 
and bad to bad_dest. The user-provided information tells us which predicates 
encode the safety property and enables us to identify these unfolding steps. 

In general, to prove Pq A p ^ q, we first repeatedly unfold the clauses of 
p and q. Deductive steps like folding are applied subsequently. Therefore, it is 
possible to apply finite sequence of unfolding steps Pq ^ . . . ^ Pi — > . . . ^ P„ 
s.t. a folding step applicable in program Pi which leads to a proof of Pq F 
p ^ q is disabled in P„. One way to prevent such disabling of deductive steps 
is to check for applicable deductive steps ahead of unfolding steps. However, 
this would add theorem proving overheads to model checking (model checking 
is accomplished by unfolding). Our goal is to perform zero-overhead theorem 
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proving, where deductive steps are never applied if model checking alone can 
complete the verification task. The other solution is to incorporate heuristics 
for identifying unfolding steps which disable deductive steps. This approach is 
taken in our prover. The prover prevents any unfolding of a predicate encoding 
temporal property which generates variable instantiations. 



5 Case Studies and Experimental Results 

In this section, we first illustrate the use of our prover in proving mutual ex- 
clusion of the Java meta- locking algorithm fQ. Then, in section we present 
the experimental results obtained on parameterized cache coherence protocols, 
including (a) single bus broadcast protocols e.g. Mesi, (b) single bus protocols 
with global conditions e.g. Illinois, and (b) multiple bus hierarchical protocols. 



5.1 Mutual Exclusion of Java Meta-Lock 

In recent years, Java has gained popularity as a concurrent object oriented lan- 
guage, and hence substantial research efforts have been directed to efficiently 
implementing the different language features. In Java language, any object can 
be synchronized upon by different threads via synchronized methods and syn- 
chronized statements. Mutual exclusion in the access of an object is ensured 
since a synchronized method first acquires a lock on the object, executes the 
method and then releases the lock. To ensure fairness and efficiency in access- 
ing any object, each object maintains some synchronization data. Typically this 
synchronization data is a FIFO queue of the threads requesting the object. Note 
that to ensure mutually exclusive access of an object, it is necessary to observe a 
protocol while different threads access this synchronization data. The Java meta- 
locking algorithm P solves this problem. It is a distributed algorithm which is 
observed by each thread and any object for accessing the synchronization data of 
that object. It is a time and space efficient scheme to ensure mutually exclusive 
access of the synchronization data, thereby ensuring mutually exclusive access 
of any object. Model checking has previously been used to verify instances of 
the Java Meta- locking algorithm, obtained by fixing the number of threads 0. 

The formal model of the algorithm consists of asynchronous parallel compo- 
sition (in the sense of Milner’s CCS) of an object process, a hand- ojf process and 
an arbitrary number of thread processes. To completely eliminate busy waiting 
by any thread, the algorithm performs a race between a thread acquiring the 
meta-lock and the thread releasing the meta-lock. The winner of this race is 
determined by the hand-off process, which serves as an arbiter. 

We model the object process without the synchronization data since we are 
only interested in verifying mutually exclusive access of this data. Apart from the 
synchronization data, the meta-locking algorithm implicitly maintains another 
queue : the queue of threads currently contending for the meta-lock to access 
the synchronization data. However, for verifying mutual exclusion we only model 
the length of this queue. The local state of the object process therefore contains 
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a natural number, the number of threads waiting for the meta-lock. This makes 
the object an infinite state system. 

The thread and the hand-off processes are finite state systems. A thread 
synchronizes with the object to express its intention of acquiring/releasing the 
meta-lock. A thread that faces no contention from other threads while acquir- 
ing/releasing the meta-lock is said to execute the fast path. Otherwise, it executes 
the slow path where it gets access to the meta-lock in a FIFO discipline. When 
its turn comes, it is woken up by the hand-off process which receives acquisi- 
tion/release requests from the acquiring/releasing threads. 

We straightforwardly encoded the state representations and the transitions 
in the formal model of the protocol as a logic program. The modeling of the 
protocol took less than a week, with the help of a colleague who had previously 
modeled it in a CCS-like language for model checking. A global state in the logic 
program encoding is a 3-tuple (Th, Obj , H) where Th is an unbounded list of 
thread states, Obj is a state of the object process (containing an unbounded 
term representing a natural number) and H is a state of the hand-off process. 

Our prover automatically proves transition invariance for a strengthening of 
the mutual exclusion invariant (the mutual exclusion invariant states that < 2 
threads own the meta-lock) . This strengthening was done manually, by reasoning 
about the local states of the hand-off and object processes. This is because the 
mutual exclusion invariant is not preserved by every transition (even though 
a state violating mutual exclusion is never reachable from the initial state of 
the algorithm). Thus, to prove mutual exclusion by transition invariance the 
invariant to be proved must be strengthened. Since our inductive prover cannot 
strengthen induction hypothesis in a proof, the strengthening was done manually. 
However, once the strengthened invariant is fed, the proof proceeds completely 
automatically. The timings and the number of proof steps are reported in Table E 
and further discussed in next section. 

Recall from section EH that for transition invariance we need to show two 
predicate implications bad_start ^ false and bad_src bad_dest. Since our 
proof technique supports nested induction, our prover proves 39 predicate im- 
plications (including these two) in the mutual exclusion proof of Java meta-lock. 
The 37 other predicate implications are automatically discovered and proved by 
our prover. The nesting depth of the inductive mutual exclusion proof is 3. 



Table 1. Summary of Protocol Verification Results. 



Protocol 


Invariant 


Time (secs) 


# Unfolding 


^Deductive 


Meta-Lock 


^owner -|- ^handout < 2 


129.8 


1981 


311 


Mesi 


#m + #e < 2 


3.2 


325 


69 


#m -1- #e = 0 V #s = 0 


2.9 


308 


63 


Illinois 


# dirty < 2 


35.7 


2501 


137 


Berkeley RISC 


# dirty < 2 


6.8 


503 


146 


Tree- cache 


#bus_with_data < 2 


9.9 


178 


18 
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5.2 Experimental Results 

Table Q presents experimental results obtained using our prover: a summary of 
the invariants proved along with the time taken, the number of unfolding steps 
and the number of deductive steps {i.e. folding, and comparison of predicate 
definitions) performed in constructing the proof. The total time involves time 
taken by (a) unfolding steps (b) deductive steps, and (c) the time to invoke 
nested proof obligations. All experiments reported here were conducted on a 
Sun Ultra-Enterprise workstation with two 336 MHz CPUs and 2 GB of RAM. 
In the table, we have used the following notational shorthand: denotes the 

number of processes in local state s. Mesi and Berkeley RISC are single bus 
broadcast protocols mm- Illinois is a single bus cache coherence protocol 
with global conditions which cannot be modeled as a broadcast protocol 1^ . 
Tree- cache is a binary tree network which simulates the interactions between the 
cache agents in a hierarchical cache coherence protocol m 

The running times of our prover are slower than the times for verifying single 
bus cache coherence protocols reported in 0 . Unlike |S| , our prover implements 
the proof search via meta-programming. It is feasible to implement our proof 
search at the level of the underlying abstract machine thereby improving effi- 
ciency. Moreover, note that the abstraction based technique of 0 is not suitable 
for verifying parameterized tree networks. 

The number of deductive steps in our proofs is consistently small compared 
to the number of unfolding steps, since our proof search strategy applies deduc- 
tive steps lazily. Due to its tree topology, the state representation of Tree- cache 
has a different term structure. This results in a larger running time with fewer 
transformation steps as compared to other cache coherence protocols. Finally, 
the proof of Java meta-locking algorithm involves nested induction over both 
control and data of the protocol. This increases the number of nested proof 
obligations, and hence the running time. 



6 Related Work and Conclusions 



Formal verification of parameterized systems has been researched widely in the 
last decade. Some of the well studied techniques include network invariants 
(where a finite state process invariant is synthesized), and use of gen- 
eral purpose theorem provers e.g. PVS |32|, ACL2 Coq In the recent 
past, a lot of activity has been directed towards developing automated techniques 
for verifying (classes of) parameterized systems. These include identification of 
classes for which parameterized system verification is decidable [fill Oil 311^ , and 
application of model checking over rich languages [^il 1 2f I ti^T^ . 

The rich language model checking approach finitely represents the state space 
and transition relation of a parameterized family via rich languages e.g. regular, 
tree-regular languages for linear, tree networks. Note that our approach achieves 
a different finite representation; we finitely represent infinite sets of states as 
recursively defined logic program predicates. In comparison to the rich language 
approach, our technique is not tied to specific classes of networks based on the 
choice of the rich language. Thus we have verified parameterized networks of 
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various topologies e.g. chain, ring, tree, complete graph, star networks. Moreover, 
the rich language approach constructs proofs by state space traversal (uniform 
proofs) whereas our proofs are inductive. 

Our prover is a lightweight automated inductive theorem prover for con- 
structing nested induction proofs. Note that in our approach, the induction 
schema as well as the lemmas to be used in the inductive proof must be im- 
plicit in the logic program itself. This is a limitation of our method. Besides, 
our proof technique does not support strengthening of induction hypothesis in 
an inductive proof. However, if the schema and the lemmas are implicit in the 
logic program, our syntax based transformations uncover the induction schema 
and reason about its different cases by uncovering the requisite lemmas. 

As future work, we plan to integrate automated invariant strengthening 
techniques ^ into our proof technique. This would involve developing a proof 
methodology containing both program analysis (to strengthen invariants) and 
program transformation (to inductively prove the invariants). 
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Abstract. This paper describes an approach to engineering efficient model 
checkers that are generic with respect to the temporal logic in which system prop- 
erties are given. The methodology is based on the “compilation” of temporal for- 
mulas into variants of alternating tree automata called alternating Biichi tableau 
automata (ABTAs). The paper gives an efficient on-the-fly model-checking pro- 
cedure for ABTAs and illustrates how translations of temporal logics into AB- 
TAs may be concisely specified using inference rules, which may be thus seen as 
high-level definitions of “model checkers” for the logic given. Heuristics for sim- 
plifying ABTAs are also given, as are experimental results in the CWB-NC ver- 
ification tool suggesting that, despite the generic ABTA basis, our approach can 
perform better than model checkers targeted for specific logics. The ABTA-based 
approach we advocate simplifies the retargeting of model checkers to different 
logics, and it also allows the use of “compile-time” simplifications on ABTAs 
that improves model-checker performance. 



1 Introduction 



Temporal-logic model-checking algorithms determine whether or not a given system’s 
behavior conforms to requirements formulated as properties in an appropriate tempo- 
ral logic. Numerous algorithms for different logics and system modeling formalisms 
have been developed and implemented KiSISilll 4E0I24I2.5I27I . and case studies have 
demonstrated the utility of the technology (see IfTol for a survey). 

Traditional model checkers work for one logic and one class of system models. For 
example, the algorithm in 0 checks whether systems given as Kripke structures obey 
properties expressed in CTL, while the automaton-based approach of works on 
Kripke structures and properties given in linear-time temporal logic. Other algorithms 
have been developed in the context of labeled transition systems and the modal mu- 
calculus ira, Modecharts and real-time logics El, and so on. This paradigm for model 
checking has yielded great research insights, but it has the disadvantage that changes to 
the modeling formalism (e.g. by changing the interpretation of state and transition la- 
bels) or the logic (e.g. by introducing domain-specific operators) necessitate a redesign 
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and reimplementation of the relevant model-checking algorithm. The amount of work 
needed to “retarget” a model checker can be an important factor hampering the uptake 
of the technology. 

The goal of this paper is to demonstrate the utility of an alternative view of model 
checking that relies on translating temporal formulas into intermediate structures, alter- 
nating Biichi tableau automata (ABTA) that a model checker then works on. AB- 
TAs are variants of alternating tree automata d that support efficient model checking 
while enabling various “compile-time” optimizations to be performed. They also sup- 
port the abstract definition, via “proof rules,” of translation procedures for different 
temporal logics. By factoring out the formulation of model-checking questions from 
the routines that answer them, our framework simplifies retargeting model checkers to 
different system formalisms and temporal logics. 

The remainder of this paper develops as follows. The next section presents the sys- 
tem models considered in this paper and defines ABTAs. Section 0 then develops an 
efficient on-the-fly model-checking algorithm for a large class of ABTAs, and the sec- 
tion following describes simplifications that may be performed on ABTAs. A method 
for translating temporal logics into ABTAs is given via an extended example in Sec- 
tional and the section following describes an implementation and experimental results. 
Section Qdiscusses related work, while the final section contains our conclusions and 
future work. An appendix contains full pseudo-code for the model-checking algorithm. 

2 Transition Systems and Tableau Automata 

This section defines our system models and introduces alternating Biichi tableau au- 
tomata. In what follows we fix disjoint sets {p,p' ,pi, . . . g)A and {9,9' ,9i, . . . G)Aact 
of atomic state and action propositions, respectively. 

2.1 Transition Systems 

Transition systems encode the operational behavior of systems. 

Definition 1. A transition system (TS) is a tuple {S, A, is, Ha, — sj) where S' is a 
set of states; A is a set of actions; is ■ S — > 2~^ is the state labeling function-, iA '■ 
A — > is the action labeling function-, — > C S x A x S is the transition relation-, 
and Si is the start state. I 

Intuitively, S contains the states a system may enter and A the atomic actions a system 
may perform. The labeling functions is and iA indicate which atomic propositions hold 
of a given state or action, while — > encodes the execution steps the system may engage 
in and s/ the initial state of the system. We write s s' in lieu of (s, a, s') € — >. 

Definition 2. Let T = (S, A, is, iA, — s/) be a TS. 

1 A ■ ■ c r, ' ^2 CKfc 

1. A transition sequence trom sq G a is a sequence a = Sq > Si — > • • • > Sk, 

where 0 < fc < oo. We define the length of a, |(t|, to be k. If \a\ = oo we call a 
infinite-, otherwise, it is finite. 

2. An execution from sq is a maximal transition sequence from that is, a sequence 
a with the property that either |tj| = oo, or |cr| < oo and S|o-| 7 ^ s' for any a € A 
and s' G S. 

If s G S' then we use £t{s) to denote the set of executions in T from s. I 



40 Girish S. Bhat, Ranee Cleaveland, and Alex Groce 



2.2 Alternating Biichi Tableau Automata 

In this paper we use alternating Biichi tableau automata (ABTAs) as an intermediate 
representation for system properties. ABTAs are alternating tree automata, although 
they differ in subtle and noteworthy ways from the automata introduced in lE?l : Sec- 
tionQgives details. To define ABTAs formally we first introduce the following syntactic 
sets. Let ^ be a distinguished negation symbol; we define C = {^p \ p & A} 

to be the set of state literals and £act = .^act U { | 9 G Aact } to be the set of 

action literals. We also use 0, 0', ... to range over subsets of £act- ABTAs may now 
be defined as follows. 

Definition 3. An alternating Biichi tableau automaton (ABTA) is a tuple {Q, £, — > 
where Q is a finite set of states; i : Q ^ CLI {->, A, V, [0], (0)} is the state 
labeling; — > C (Q x Q), the transition relation, satisfies the condition below for all 
q G Q; qi G Q is the start state; and iF C 2'^ is the acceptance condition. The additional 
condition ^ must satisfy is: 



As ABTAs are special node-labeled graphs we use typical graph-theoretic notions, in- 
cluding cycle, path, strongly-connected component, etc. We also write q — >* q' if there 
exists a path from q to q' in ABTA B. We say that an ABTA is well-formed if, whenever 
^{q) = then q does not appear on a cycle of ^ edges. We only consider well-formed 
ABTAs in what follows. 

Besides alternating tree automata, ABTAs may be viewed as abstract syntax for a 
fragment of the mu-calculus 0. They may also be seen as defining system properties 
in terms of how the property in question is to be “proved”, and we develop this intuition 
in presenting their semantics. More specifically, an ABTA defines a property of transi- 
tion systems by encoding a “proof schema” for establishing that the property holds for 
a transition system. The states in the ABTA can be seen as goals, with the labels in the 
states defining the relationship that must hold between a state and its “subgoals”. So if 
one wishes to show that a transition-system state s “satisfies” a state q in an ABTA, and 
the label of q is A, then one must show that s satisfies each of g’s children. The [0] and 
(0) labels correspond to single-step modalities; for a transition-system state s to satisfy 
an ABTA state q whose label is [0], one must show that for each s' such that s s' 
for some a “satisfying” 0, s' must satisfy the (unique) successor of q. Finally, the ac- 
ceptance sets enable “proofs” to be infinite: an “infinite positive” proof is deemed valid 
if every “path” in the proof “touches” each set in F infinitely often, while an infinite 
“negative proof” is valid if it fails to “touch” at least one set in F infinitely often. (The 
first clause is the same as the generalized Biichi acceptance condition defined in Ha. 
It should also be noted that the second clause indicates that ABTAs have a “co-Biichi” 
component to their acceptance condition.) These intuitions may be formalized in terms 
of “runs” of an ABTA. To define these we first introduce the following terminology. 




I 



Definition 4. LetT = {S,A,£sAa, 



, Si) be a TS with s G S. 
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1. Let p G A. Then s |=r P if and only if p G fs(s), and s \=q — >p if and only if 
p ^ £s{s). 

2. Let 9 G Aact- Then a \=t 9 if and only if 0 G £a{c(), and a \=q — <9 if and only if 
9 ^ £A{oi)- 

3. Let 0 C £act- Then a \=t 0 if and only if a ^ 6* for every 0 G 0. We write 
s s' if and only if s s' for some a £ A such that a \=r 0 and s if 
there is no s' such that s s' . 

I 

Definitions. A run of an ABTA B = on a TS T = 

{S,A,£s,£a, sq) is a maximal tree in which the nodes are classified as positive or 
negative and are labeled by elements of Q x S' as follows. 

- The root of the tree is a positive node and is labeled with {qq, sq). 

- If (T is a positive (negative) node with label {q, s) such that £(q) = ^ and q — s-g q', 
then a has one negative (positive) child labeled {q' , s). 

- Otherwise, for a positive node cr labeled with {q, s): 

• If £{q) G C then cr is a leaf. 

• If%) = A and {q' \ q } = {9ij then cr has positive children 

CTi, .., (Tm, with Oi labeled by {qi, s). 

• If 4?) = V then cr has one positive child, cr', and cr' is labeled with {q' , s) for 
some q' &{q' \q^Bq' }• 

• If £{q) = [0] , q ^ q', and { s' | s } = {si,--,Sm} then cr has 

positive children cti, .., am, with ai is labeled by {q' , Si). 

• If £{q) = {0) and q ^ q' then a has one positive child cr', and cr' is labeled by 

{q'j s') for some s' such that s s'. 

- Otherwise, for a negative node a labeled with {q, s): 

• If £{q) G C then cr is a leaf. 

• If%) = A then a has one negative child labeled with {q' , s) for some q' G 

{q' \q^eq' }■ 

• if^(g) = V and {q' \ q ~^b q' } = {qi, then cr has negative children 

a\,..,am, with ai labeled by {qi, s). 

• If £{q) = [0] and q -^b q' then a has one negative child a' labeled by {q' , s') 

for some s' such that s s' . 

• If £{q) = (0) , q — q' , and { s' | s — >r } = {si, Sm} then a has 

negative children cri , am, with ai is labeled by (g', Sj). 

I 

In a well-formed ABTA, every infinite path has a suffix that contains either positive or 
negative nodes, but not both. Such a path is referred to as positive in the former case 
and negative in the latter. We now define the notion of success of a run. 

Definition 6. Let i? be a run of ABTA B = {Q,£,^B,qi,£F) on a TS T = 

{S,A,£s,£a,^t,si). 
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1 . A positive leaf labeled {q, s) is successful if and only if s hr e{q) or e{q) = [0] 
and s-^r- 

2. A negative leaf is successful if and only if s hr ({q) or i{q) = (0) and s-^t- 

3. A positive path is successful if and only if for each F G IF some q G F occurs 
infinitely often. 

4. A negative path is successful if and only if for some F G F there is no g G F that 
occurs infinitely often. 

Run R is successful if and only if every leaf and every infinite path in R is successful. 
TS T satisfies B (T \= B) if and only if there exists a successful run of B onT. I 

It is straightforward to establish the following, where if B is an ABTA with state q then 
B[q] is the ABTA B with the start state changed to q. 

Lemma 1. Let T be a TS, let B = {Q, i, — qi, F) be an ABTA, and let q,q' G Q 
be such that q — s-g q' and i{q) = Then T h ^[<z] if and only ifF h ^[q'\ 

Next we define the subset of and-restricted ABTAs. 

Definition 7. ABTA {Q, g/, F) is and-restricted if and only if every q G Q sat- 

isfies: 

1. ife{q) = A then there is at most one g' such that q ^ q' and g' — *■* g; and 

2. if i{q) = [0] and q ^ q' then g' q. * 

And-restriction plays an important role in our model-checking procedure, and we com- 
ment more on it here. In an and-restricted ABTA the strongly-connected component of 
a state labeled by A can contain at most one of the state’s children; a state labeled by [0] 
on the other hand is guaranteed to belong to a different strongly-connected component 
that its child. And-restrictedness differs from the notion of hesitation introduced in El; 
an ABTA would be hesitant if, roughly speaking, every strongly-connected component 
of a node labeled by A or [0] would contain only nodes labeled by A or 0. Neverthe- 
less, and-restrictedness plays the same role in our theory that hesitation does in El: 
automata obeying these conditions give rise to more efficient model-checking routines 
while still providing sufficient expressiveness to encode logics such as CTL*. 

3 ABTAs and Model Checking 

Checking whether or not T \= B for TS T and ABTA B reduces to searching for the 
existence of a successful run of B on T. This section presents an efficient on-the-fly 
algorithm for this check in the setting of and-restricted ABTAs. 

3.1 TSs, ABTAs, and Product Graphs 

Our ABTA model-checking algorithm works by exploring the “product graph” of an 
ABTA and a TS. In what follows, fix ABTA B = {Q,i, — and TS T = 
{S, A, £s, £a, — , sj), and assume that F = {Fq, . . . F„_i}. The product graph of 
B and T has vertex set C = Q x S x {0, . . . , n — 1} and edges E C V x V defined by 
((g, s, z), (g', s', i')) G F if and only if: 

- there exist nodes a and a' in some run of on T labeled (g, s) and (g', s') respec- 
tively and such that a ^ a' \ and 

- either q ^ Fi and z' = z, or g G Fi and z' = (z -f 1) mod zz. 
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En C E consists of those edges ((g, s, i), {q' , s', z')) such that q and q' are in different 
strongly-connected components in B, while Eft = E — Em- We sometimes refer to Em 
as the nonrecursive relation and to Eft as the recursive relation. A vertex (g, s, i) in the 
product graph is said to he accepting if and only if q G Eq and z = 0. 



3.2 Searching the Product Graph 

We now present an algorithm for determining if the product graph mentioned above 
contains a successful run in the case that the ABTA B is and-restricted. The routine is 
based on the memory-efficient on-the-fly algorithm for emptiness-checking of Biichi 
word automata in m; as is the case in that algorithm our goal is to eliminate the 
storage penalty associated with the “strongly-connected component” algorithms M- 
The alterations are necessitated by the fact that ABTAs contain conjunctive as well as 
disjunctive states and are intended to accept TSs (i.e. trees) rather than words. 

Like the algorithm in II I .fil ours employs two depth-first searches, DFSl and DFS2, 
that attempt to mark nodes as either true or false. The purpose of the former is to search 
for true and false leaves in the product graph, and to “restart” the latter whenever an 
accepting node is found. The latter determines whether or not the node given to it is 
reachable from itself via nodes not previously traversed by DFS2. The success of DFS2 
has implications for the existence of runs with successful paths. Pseudo-code for the 
these procedures may be found in the appendix. 

When exploring v = (g, s, z), DFS 1 uses the label of g in and the transitions from 
s in T to guide its search. The non-recursive successors of v are processed first via 
recursive calls to DFSl; if the results do not immediately imply the truth or falsity of v, 
then DFSl is called recursively on v’s recursive children. (Note that this simplifies the 
treatment of negation: no explicit treatment of “infinite negative paths” is necessary in 
the algorithm. Also note that since ABTAs are and-restricted, all but one of the children 
of a node labeled by A can have their truth values determined by recursive calls to 
DFSl. This latter fact is crucial to the correctness of our algorithm.) If these results are 
inconclusive, and v is accepting, then DFS2 is called to determine if v is reachable from 
itself. If this is the case, then v is labeled as true. (DFS2 cycles involving FALSE states 
are, of course, not allowed). 

A subtlety arises in our setting when a recursive child v' of v has been visited 
previously by DFSl and v' has not been marked true or false. The node v' cannot 
necessarily be assumed to be false, as is implicitly done in O, because there may be a 
successful cycle in the same strongly-connected component as it that was not detected 
until after DFSl iv') terminated. To avoid needless recomputation in this case, we 
maintain a dependency set for each node; these sets contain nodes that should become 
true if the indicated node is found to be true. In the example above we would insert v 
into the dependency set of v'\ if v' is later marked as true, then v would be as well. 

Theorem 8. DFSl ( {qf, Sf, 0) ) returns “true” if and only ifT |= B. 



Theorem 9. Let B = (Qjf', — >e,g/,JF) be an ABTA and T = {S,A,^s^^A , — 

, Sf) be a TS. Then DFSl (qf, Sf, 0 ) runs in time linear in the size of the product graph 
ofB and T, whose vertex set is bounded in size by |Q| • [S'! • \E\, where \E\ is the number 
of component sets in T. 
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4 Reducing ABTAs 

The previous theorem indicates that the time needed to check whether or not T \= 
B depends intimately on the number of states in B. Consequently, any preprocessing 
that reduces the number of states in B can have a significant impact on model-checker 
performance. In this section we present several heuristics that may be used to eliminate 
states in ABTAs. 

BUchi State Set Minimalization. The ABTA acceptance condition specifies that an in- 
finite (positive) path in a run is successful if and only if that path contains an infinite 
number of states from each of the sets of accepting states. This can only occur when a 
cycle in the ABTA contains at least one state from each set of accepting states. More- 
over, a state not part of any such cycle can safely be removed from all member sets in 
T, since no infinite path going through that state can satisfy the Biichi condition. 

To check for such states we perform a depth-first search for cycles that contain at 
least one member of each set of accepting states. If for a particular state such a cycle 
does not exist, that state is removed from all accepting sets that contain it. While not 
reducing the size of the ABTA directly, this transformation is important for two reasons. 

1. It improves the performance of other reductions. Some of the other reductions may 
only be applied to states that are members of the same accepting sets. Eliminating 
states from accepting sets improves their performance. 

2. The size of the product automaton is reduced. Each state in the product graph con- 
tains an index reflecting the member set of T “currently” being searched for. This 
search procedure is unnecessary for states not having the kind of cycle just de- 
scribed; by removing these states from acceptance sets, unnecessary vertices asso- 
ciated with this search can be avoided. 

Constant Propagation. Some atomic state propositions are uniformly true or false of 
all TS states, and these values can be propagated upwards as far as possible. 

Associative Joining. Because V and A are associative we can also apply another re- 
duction: for any A(V)-labeled state q with a transition to another A(V)-labeled state q' , 
where q and q' are in the same sets of accepting states, remove the transition from q to 
q' and add outgoing transitions from q to every state to which q' had a transition. This 
is applied recursively (if q' has a transition to another A(V)-labeled state q" we also add 
its outgoing transitions to q, and so forth). This has two benefits: (1) the state q' may 
become unreachable and hence removable, thereby reducing ABTA size and (2) model 
checking avoids passing through q' (and q” , etc.) in the depth-first searches starting 
from q. Because q and q' must be in the same sets of accepting states, this simplifica- 
tion is much more effective performed after accepting- state set minimalization. 

Quotienting via Bisimulation. The final simplification involves merging states with the 
same “structure.” We do this using bisimulation lE^ . Specifically, we alter the tradi- 
tional definition of bisimulation to take account of state labels and acceptance set infor- 
mation, and we then quotient B by this equivalence. To ensure maximum reduction this 
should always be the last simplification applied. 
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5 Translating Temporal Formulas into ABTAs 

A virtue of ABTA-based model checking is that translation procedures for temporal 
logics into ABTAs may be defined abstractly via “proof rules.” This section illustrates 
this idea via an example, by giving the rules needed to translate a variant of CTL* 
into ABTAs. The logic, which we call Generalized CTL* (GCTL*), extends CTL* by 
allowing formulas to constrain actions as well as states. While the logic itself is not 
very novel, it does contain “deviations” from CTL* that typically require alterations 
to a CTL* model checker. Our intention is to show that proof-rule-based translations, 
coupled with generic ABTA technology, can make it easier to define such “alterations”. 

The syntax for our logic is given below, where p G A and 9 G Aact- 

S ::=p I ^p I SAS I SVS I AV I BP 
V ::=e \ ^9 \ S \ V AV \ PyP \ XV \ V\)V \ V\/V 

The formulas generated by S are state formulas, while those generated by V are path 
formulas. The state formulas constitute the formulas of CCTL* . In what follows we use 
V', ■01 ) • ■ • to range over state formulas and (f>, (f>' , (f>i, . . . to range over path formulas. 

Semantically, the logic departs from traditional CTL* in two respects. Firstly, the 
paths that path formulas are interpreted over have the form sq si • • • and thus 
contain actions as well as states. Secondly, as TSs may contain deadlocked states some 
provision must be made for finite maximal paths as models. The CCTL* semantics 
follows a standard convention in temporal logic by allowing the last state in a finite 
maximal path to “loop” to itself; the action component of these implicit transitions is 
assumed to violate all atomic action propositions 9 G Aa,ct- 

Mathematically, a state satisfies A(j> (E<j>) if every execution (some execution) ema- 
nating from the state satisfies (j). An execution satisfies a state formula if the initial state 
in the execution does, and it satisfies 9 if fhe execution contains at least one transition 
and the label of the first transition on the path satisfies 9. A pafh satisfies ^9 if eifher 
the first transition on the path is labeled by an action not satisfying 9 or the path has 
no transitions. X represents the “next-time operator” and has the usual semantics when 
the path is not deadlocked. A deadlocked path of form s satisfies X<P if s satisfies 
(j>i\}(j )2 holds of a pafh if (pi remains true until <f )2 becomes true. The constructor V may 
be thought of as a “release operator”; a path satisfies 0i V (/)2 if 02 remains frue until 0i 
“releases” the path from the obligation. This operator is the dual of the until operator. 
The details of the semantics are standard and omitted. 

In CCTL* X is self-dual. Thus, while the application of negation is restricted in this 
logic we nevertheless have the following. 

Lemma 2. Let tp be a state formula in GCTL*. Then there exists a formula neg{ip) 
such that any state in any TS satisfies neg{ip) if and only if it does not satisfy f. 

Our approach to generating ABTAs from CCTL* formulas uses goal-directed rules 
to construct tableaux from formulas. These rules operate on “formulas” of the form E<P 
and A<1>, where ^ is a set of path formulas. Intuitively, these terms are short-hand for 
0) and A{\/^^^ 0), respectively. We also call a set 0 of action literals positive 

if it contains some 9 G Aa,ct- Otherwise it is referred to as negative. We use S', 'f'l, 

and r, Vi, . . . to denote positive sets and negative sets respectively. 
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R1 A : ' ~ r" R2 V : ' T" R3 V : 



4>1 A •i/)2 



V>1 V Tp2 



Ipl i>2 



i>l 1p2 



Ip 



i?4 ^ ^ R5 . 

V» E(neg<?) 



J?6 A: R7 y. E(<?><^iV^2) 



E(<?) E(V>) 






R9 V : 



E(#, </)iV(^2) 



E(<f, 1^1, (^2) E(^, <j!)2,X((?iiV(^2)) 



i?10 V : 



E(<?, 0i) E(#, (^ 2 ) 

E(<?,<^iU02) 

E(<?i,<^2) E(<|i,,^i,X(<?!>iU<^2)) 



-Rll W : M2 ((r)) : E(rX<^i,...,X0„) 



is a positive set of action literals, while _T is a negative set. 



Fig. 1. Tableau Rules for GCTL*. 



To construct an ABTA for state formula ip one first generates the states and transi- 
tions. Intuitively states will correspond to state formulas, with the initial state being ip 
itself. To generate new states from an existing state ip' , one applies the rules in Figure^ 
to Ip' in the order specified. That is, one determines which of Rl-12 is applicable to ip', 
beginning with Rl, by comparing the form of ip' to the formula appearing in the “goal 
position” of each rule. The label of the rule then becomes the label of the state, and the 
subgoals of the rule are then added as states (if necessary), and transitions from ip' to 
these states added. Leaves are labeled by the state literals they contain. This procedure 
is repeated until no new states are added; it is guaranteed to terminate [Q. 

For notational simplicity we have introduced a new label in Rule R12. Intuitively, if 
an ABTA state is labeled {{F)) then it behaves like {F) for nondeadlocked TS states. 
For deadlocked states, the state is required to satisfy the single descendant. This opera- 
tor can be encoded using the other ABTA constructs. 

To define the acceptance condition PF, suppose cp = cpi\J(p2 € q and let = {q' G 
Q \ {(t^ ^ q' tirid X(p ^ q') or (p2 G q' }. Then \ (p = (piU(p2 and 3q G 

Q.(p G q}. We now have the following @. 

Theorem 10. Let ip be a GCTL* formula and let be the BTA obtained by the trans- 
lation procedure described above. Then the following hold. 

1. B.^ is and-restricted. 

2. Let T = (5, — !■, L, sf) be a TS. Then sq [=r P if and only ifT is accepted by Bp. 

In general will be exponential in the size of ip. However, if ip falls within the GCTL 
fragment of GCTL*, then is linear in the size of ip. 

We close this section with some comments on and-restrictedness. Our model- 
checking routine only works on and-restricted ABTAs, which means that the rule-based 
approach described above for producing model checkers only works if the rules gener- 
ate and-restricted ABTAs. In practice this means that in rules labeled by A, at most one 
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subgoal can be “recursive”, i.e. can include the formula identified in the goal. For tem- 
poral logics based on CTL* this restriction is not problematic, since the recursive char- 
acterizations of standard modalities only involve one “recursive call”. For logics such 
as the mu-calculus in which arbitrary recursive properties may be specified the relevant 
rules would not satisfy this restriction, and the approach advocated in this paper would 
not be applicable. (It should be noted, however, that sublogics of the mu-calculus, such 
as the L2 fragment identified in O, do fit into our framework.) 



6 Implementation and Empirical Assessment 



To assess our ideas in practice we implemented ABTAs in the CWB-NC verification 
tool il2ll . The procedures we coded (in Standard ML) included: basic ABTA manipu- 
lation routines (819 lines); the ABTA model-checking routine given in Section[3 (631 
lines); and ABTA simplification routines described in Sectional (654 lines). The rou- 
tines made heavy use of existing CWB-NC data structures for manipulating automata. 
This code is “generic” in the sense that it would be used by any ABTA-based model 
checker implemented in the CWB-NC. 

We also implemented a front-end for GCTL* using the Process Algebra Compiler 
(PAC) ca, a parser- and semantic-routine generator for the CWB-NC. We used sets 
of actions as atomic action propositions and included only “true” and “false” as atomic 
state propositions, with with obvious interpretations. The code for the front-end in- 
cluded 214 lines of yacc and 605 lines of auxiliary code, with most of the latter be- 
ing devoted to the calculation of acceptance- set information and the implementation of 
Rule 5 in Figure HJ It should be noted that of this code, approximately 15% is GCTL* 
specific; fhe rest could also be used defining e.g. a CTL* model checker. 

To sfudy fhe performance of our implementation we used two existing case studies 
included in the current distribution of the CWB-NC to compare our generic ABTA- 
based model checker for GCTL* with the model checker for the L2 fragment of the mu- 
calculus that is included in the CWB-NC release. The systems studied included a ren- 
dering of the SCSI-2 Bus Protocol Q and a description of the Slow-Scan fault-tolerant 
communications protocol Cl- In both applications mu-calculus formulas encode key 
properties of the systems in question. We used the existing models but translated the 
formulas in question into GCTL*; we then ran our ABTA-based model checker for 
GCTL* in order to compare its efficiency with the CWB-NC’s mu-calculus checker. 
We also performed a deadlock-freedom check in both logics as well. 

The properties included several involving fairness constraints. Emblematic of these 
is Property 2 in [Q, which asserts that any phase in the SCSI-2 protocol eventually 
ends, provided the initiator in the protocol does not repeatedly issue an ATN signal. 
This property may be encoded in GCTL* as follows 

AG({@begin_Phase} ^ (F{@end_Phase}vGF{@obs_setATN, ©obsplace})) 

This formula asserts that along all paths, whenever the action @begin_Phase oc- 
curs, then either the action @end_Phase is performed or at least one of the actions 
@obs_setATN and ©obsplace occurs infinitely often. (The @obspace action is 
needed for reasons relating to the modeling.) The corresponding mu-calculus used in 
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the case study is given below. 

^(/iX.(@begin_Phase)(/rF.j/Z.(@begin_Phase)tt V (@end_Phase)XV 
(— {©obsplace, @obs_setATN, @end_Phase})ZV 
({©obsplace , @obs_setATN})y) V (— {@begin_Phase})X) 



Table 1. SCSI-2 Performance Data for ABTA Model Checker. All times are in seconds. 



Reference 
#in Ql 


Unsimplified 
ABTA size 


Simplified 
ABTA size 


ABTA 

Time 


Mu-calculus 

Time 


1 


42 


24 


2739.670 


3423.990 


2 


54 


8 


533.400 


1022.430 


3 


12 


8 


676.220 


542.180 


4 


12 


8 


401.300 


483.470 


5 


42 


20 


410.540 


943.560 


6 


57 


8 


509.420 


984.600 


NoDeadlock 


7 


5 


593.240 


704.850 



Tables Q and 13 give our experimental results. For each of the formulas we record: 
the size of the ABTA before and after simplication, and the running times of the ABTA- 
based GCTL* model checker and the CWB-NC model checker on the equivalent mu- 
calculus formula. Timing information was collected on a Sun Enterprise E450 with two 
336 MHz processors and 2 GB of main memory. Some comments are in order. 

- Some ABTA state-space reduction is due to our encoding of the constructs F and G 
in terms of U and V. These encodings use constants tt (“true”) and ff (“false”), 
which constant-propagation then eliminates. Introducing explicit rules for these 
constructs would yield smaller initial ABTAs at the expense of a larger set of rules. 

- The papers Q and jm describes several different models. In each case we used 
the largest: 62,000, and 12,000 states, respectively. 

- The mu-calculus model-checker implements the on-the-fly algorithm given 
in RIT71 . which runs in 0(|M| • \cj>\ ■ ad{(f))), where \M\ is the size of the sys- 
tem, 1^1 the size of the formula, and ad{cj>) the alternation depth of 4>. 

- In the SCSI-2 example. Formulas 2, 5 and 6 involve fairness constraints, with 2 and 
6 having the same shape. Formulas 1, 3 and 4 are safety properties, with 3 and 4 
having the same shape. Thus, the minimized automata for 2 and 6 have the same 
number of states, as do 3 and 4. That 2 and 3 have the same size is a coincidence. 

- In the Slow-Scan example, only Formulas 1, 2, 8 and 9 involve fairness. 

- Because the translation procedure in FigureQ] treats A by dualizing it (i.e. convert- 
ing it into ^E^), the ABTA for deadlock-freedom has more states than usual. 

Based on the hgures in the tables, we can draw the following conclusions. 

1. The ABTA checker dramatically outperforms the mu-calculus checker on formulas 
involving fairness. The factor by which the time required by the latter exceeded 
that needed by the former ranged from 1.9 (SCSI-2 Property 6) to 5.3 (Slow-Scan 
Property 9), with the average being 3.1. This behavior is a result of the fact that due 
to the fairness constraints, the mu-calculus formulas all have alternation-depth 2, 
and the time-complexity of the mu-calculus routine is affected by alternation depth. 
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2. The ABTA model checker also outperforms the mu-calculus checker for safety prop- 
erties. In all but two cases the ABTA routine outperforms the mu-calculus routine, 
with the over-all average improvement factor being 1.6. 



Table 2. Slow-Scan Performance Data for ABTA Model Checker. All times are in sec- 
onds. 



Name 


Reference 

# in [Ql 


Unsimplified 
ABTA size 


Simplified 
ABTA size 


ABTA 

Time 


Mu-calculus 

Time 


failures-responded 


1 


52 


13 


2.890 


13.600 


failures-responded-again 


2 


59 


16 


144.720 


471.780 


can-tick 


3 


12 


8 


205.580 


328.430 


failures-possible 


4 


5 


4 


0.020 


0.080 


failures-possible-again 


5 


14 


9 


118.790 


189.380 


no-false-alarms 


6 


7 


5 


1.670 


2.760 


no-false-alarms-again 


7 


14 


8 


139.210 


221.540 


eventually-silent 


8 


92 


14 


159.710 


409.190 


react-on-repair 


9 


26 


10 


137.630 


729.550 


no-deadlock 


- 


7 


5 


205.930 


200.220 



7 Related Work 

Alternating tree automata are studied extensively as a basis for branching-time model 
checking in d- However, ABTAs differ from the automata in d in ways that we 
believe ease their use in practice; we summarize these below. 

Transition relation: In the authors embed propositional constructs inside the tran- 
sition relation. In ABTAs propositional constructs are used to label states. This 
offers advantages when ABTAs are simplified; for example, we may use the tradi- 
tional notion of bisimulation equivalence to minimize ABTAs. 

Negation: The automata in d do not use negation in the definition of transitions; 
ABTAs do allow the use of a negation operator to label states. This allows the 
acceptance component of an ABTA to be simpler (“Buchi-like”) than the Rabin 
condition in d and also simplifies the model-checking algorithm. 

Algorithm: Because of our Buchi-like condition and our consideration of and- 
restricted ABTAs, we are able to adapt the memory-efficient on-the-fly algorithm 
of fT31l . which is also time-efficient. The time-efficient algorithm of d relies on 
the construction of strongly-connected components, which our algorithm avoids. 

We reiterate that and-restricted alternating automata differ markedly from hesitant al- 
ternating automata as introduced in d- In particular, and-restricted ABTAs require no 
definition of “levels of weakness” or classification of states as existential/universal. The 
price we must pay is that “recursion through A” is limited. 

Another alternating-tree-automaton-based approach to model checking may be 
found in |BQ|. The algorithm relies on the use of games to avoid the construction of the 
strongly-connected components used in d. An implementation is described in d. 

Methods for simplifying Biichi word automata have been given in 111 912 811 . The pa- 
pers both present simulation-based techniques for reducing the number of states in such 
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automata, and shows how acceptance sets for generalized Biichi automata may be 
reduced. Neither paper considers alternating or tree automata. 

The mu-calculus m has also been proposed as an intermediate language for 
translation-based model checking [BI6I16I1 81 . Tool support for this translational- 
scheme remains problematic, however, owing in part to the complexity of the trans- 
lation procedures for logics like CTL*. Our performance hgures also suggest that the 
alternation-depth factor in mu-calculus model-checking algorithms has practical im- 
pacts: our ABTA model-checker significantly outperforms the mu-calculus checker on 
formulas with nontrivial alternation-depth. 



8 Conclusions and Directions for Future Research 

This paper presents a generic approach to building model-checkers that relies on the use 
of intermediate structures called alternating Biichi tableau automata. These automata 
support efficient model checking and simplification routines, and they also admit the 
definition of abstract proof-rule-based translation procedures for temporal formulas into 
ABTAs. This eases the task of retargeting a model-checker, since one need only specify 
the translation into ABTAs of the logic in question. We demonstrated the utility of our 
ideas by developing a translation-based model checker for a variant of CTL* . 

As future work we would like to develop automated support for the generation of 
ABTA translators from proof rules and high-level specifications of acceptance con- 
ditions. We are also interested in an efficient model-checking algorithm for all AB- 
TAs, and we would like to investigate compositional techniques for ABTAs based on 
the partial-model-checking ideas of [<13. Finally, it would be interesting to adapt the 
simulation-based automaton simplifications presented in I19I28I to ABTAs. 
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A Pseudo-Code for ABTA Model Checking 

DFS2 (t; = (g, s, z) , v' = {q' , s' ,i')) : bool = 

mark v visited by DFS2 . 

Cr := {Vr G V \ Ep{v,Vr)}. 
if v' € Cr then return TRUE, 
foreach Vr G Cr s.t. Vr not marked FALSE do 
if Vr not marked visited by DFS2 then 
if DFS2(z;r, v') then return TRUE, 
return FALSE . 
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markAndPropagate iv={q,s,i) , val : bool) : bool = 
if not val then return FALSE, 
mark v TRUE . 

foreach v' G Depend{v) do 
remove r' from Depend{v) ; 
markAndPropagate (v' , TRUE) . 
return TRUE . 

DFSl iv={q,s,i)) : bool = 

if V marked TRUE then return TRUE, 
mark v visited by DFSl. 
c„ := {r' G Q \ En{v,v')}. 

Cr := {v' G Q \ Er{v,v')}. 
case U{q)) : 
p G yXi 

return (markAndPropagate {v , s G is{p)) ) ) ■ 

— I I 

foreach G c„ do 

return (markAndPropagate {v , not DFSl (r„) ) ) . 

[0], A: 

foreach G c„ do 

if not DFSl (r„) then return FALSE, 
if Cr = 0 then 

return (markAndPropagate {v , TRUE) ) . 
for the Vr G Cr do 

if Vr marked visited by DFSl then 
insert v in Depend{vr) ■ 
else 

if DFSl (Vr) then 

return (markAndPropagate {v , TRUE) ) . 
if (accepting (r) ) then 

return (markAndPropagate [v , DFS2(r,r))). 
return FALSE . 

V, (0): 

foreach G c„ do 
if DFSl (r„) then 

return (markAndPropagate {v , TRUE) ) . 
foreach Vr G Cr do 

if Vr marked visited by DFSl then 
insert v in Depend{vr) ■ 
else 

if DFSl {Vr) then 

return (markAndPropagate {v , TRUE) ) 
if (accepting (r) ) then 

return (markAndPropagate [v , DFS2 (u, r ) ) ) . 
return FALSE . 
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Abstract. We present an algorithm to generate Biichi automata from 
LTL formulae. This algorithm generates a very weak alternating co- Biichi 
automaton and then transforms it into a Biichi automaton, using a gen- 
eralized Biichi automaton as an intermediate step. Each automaton is 
simplified on-the-fly in order to save memory and time. As usual we 
simplify the LTL formula before any treatment. We implemented this 
algorithm and compared it with Spin: the experiments show that our 
algorithm is much more efficient than Spin. The criteria of comparison 
are the size of the resulting automaton, the time of the computation and 
the memory used. Our implementation is available on the web at the 
following address: http://verif.liafa.jussieu.fr/ltl2ba 



1 Introduction 

To prove that a program satisfies some property, a standard method is to use 
Linear Time Logic (LTL) model checking. When the property is expressed with 
an LTL formula, the model checker usually transforms the negation of this for- 
mula into a Biichi automaton, builds the product of that automaton with the 
program, and checks this product for emptiness. In this paper we focus on the 
generation of a Biichi automaton from an LTL formula, trying to improve the 
time and space of the computation and the size of the resulting automaton. 

Spin ^ is a very popular LTL model checker. However, the algorithm it 
uses to generate a Biichi automaton from an LTL formula, presented in P|, 
may be quite slow and may need a large amount of memory, even for some 
usual LTL formulae. In particular, this algorithm has a very bad behavior on 
formulae with fairness conditions: it is almost impossible to use Spin to generate 
a Biichi automaton from a formula containing 5 or more fairness conditions, 
both because of the computation time and of the memory needed. For example, 
consider a simple response formula G(? — *■ F r) with n fairness conditions: 

6»n = -((GFpiA...AGFp„)^G(g^Fr)) . (1) 

A formula of this type is very often encountered in LTL model checking. More- 
over, the fairness conditions and the right-hand side property are usually more 
complex. The value of n is very often greater than 5. Alas, in this case. Spin 
fails to produce the Biichi automaton within a reasonable amount of time and 
memory (see Table Q . 

Spin’s algorithm was improved by P| (LTL2AUT), |2] (EQLTL), pi| (Wring): 
these papers did not modify the basis of the algorithm, but improved it using 
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Table 1. Comparison on the Formulae for 1 < n < 10. Time is in sec, space 
in kB. {N/A)-. no answer from the server within 24 h. (f): the program died, 
giving no result. 





Spin 


Wring 


EQLTL 


LTL2BA- 


LTL2BA 




time 


space 


time 


Space 


time 


time 


Space 


time 


Space 


6>i 


0.18 


460 


0.56 


4,100 


16 


0.01 


9 


0.01 


9 


02 


4.6 


4,200 


2.6 


4,100 


16 


0.01 


19 


0.01 


11 


03 


170 


52,000 


16 


4,200 


18 


0.01 


86 


0.01 


19 


04 


9,600 


970,000 


110 


4,700 


25 


0.07 


336 


0.06 


38 


05 






1,000 


6,500 


135 


0.70 


1,600 


0.37 


48 


06 






8,400 


13,000 


N/A 


12 


8,300 


4.0 


88 


O7 






72,00Qt 


43,000''^ 




220 


44,000 


32 


175 


08 












4,200 


260,000 


360 


250 


09 












97,000 


1,600,000 


3,000 


490 


010 
















36,000 


970 



the same core algorithm, rewriting LTL formulae, and simplifying the resulting 
Biichi automaton. These improvements are quite efficient but the actual trans- 
formation of the LTL formula to a Biichi automaton, which is similar to the 
tableau construction explained in 0, may still perform badly on some natural 
formulae such as Some experiments are presented in Table E Note that Wring 

is written in Perl while Spin and LTL2BA are written in C and that EQLTL is 
used through a web server. Hence the figures are still relevant but should not be 
compared litterally. See Sect. Qfor more details. 

In this paper, we present a new algorithm to generate a Biichi automaton 
from an LTL formula. Our algorithm is not based on the tableau construction 
presented in jSj. Instead, using the classical construction (see e.g. P2|), we first 
produce an alternating automaton from the LTL formula, with n states where 
n is less than the size of the formula. This alternating automaton turns out to 
be very weak as shown by Rohde 0. Thanks to that property, instead of gener- 
ating directly a Biichi automaton with 2" x 2" states, we are able to build first 
a generalized Biichi automaton, that is a Biichi automaton with labels and ac- 
cepting conditions on transitions instead of states, with at most 2" states. Using 
a generalized Biichi automaton is one of the most important improvements of 
our algorithm. The best solution would be to design a model-checking algorithm 
using directly this generalized Biichi automaton, but in order to compare our 
work with other ones and to use existing model-checking algorithms, we trans- 
form this automaton into a classical Biichi automaton. The method we use is 
very classical, and we obtain a Biichi automaton with at most n x 2” states. 

The second main improvement stems from our simplifications of the au- 
tomata. Since our construction goes in several steps, we are able to simplify 
the automata at each step, improving the efficiency of the following steps. The 
simplifications dramatically reduce the number of states and transitions of the 
automata, especially of the generalized Biichi automaton. Moreover, each simpli- 
fication is performed on-the-fly during the construction of each automaton. This 
is a major improvement on a posteriori simplifications. The amount of memory 
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used is about the size of the simplified automaton, instead of being the size 
of the unsimplified automaton which may be quite huge. The time needed is 
also reduced dramatically because we are exploring a much smaller part of the 
automaton during the construction. 

Using our new algorithm, we built a tool which is available on the web at 
http ; //verif . liaf a. jussieu. fr/ltl2ba Our tool is much more efficient than 
any other tool we have tried, in computation time and especially in memory. 
The results of our algorithm on the formulae On with on-the-fiy simplifications 
(LTL2BA) and with a posteriori simplifications (LTL2BA-) are detailed in Ta- 
ble d More experimental results are presented in Sect.0 There we also discuss 
the size of the generated automaton. From this point of view also our algorithm is 
usually better than Spin though occasionally it may produce a bigger automa- 
ton. Note that Spin, LTL2BA- and LTL2BA give exactly the same resulting 
automaton on the formulae On- Wring and EQLTL give bigger automata. 

The paper is organized as follows. Section |2| begins with some preliminaries 
defining linear temporal logic and its semantics. Sections 0 to 0 describe our 
algorithm and some proofs of its correctness. Section Elpresents our simplification 
methods and Sect, ^describes some experimental results. 

2 Preliminaries: Linear Temporal Logic (LTL) 

LTL was introduced to specify the properties of the executions of a system. 
A finite set Prop contains all atomic properties of states. With the standard 
Boolean operators (“i. A, V) we can only express static properties. For dynamical 
properties, we use temporal operators such as X (next), U (until), R (release), 
F (eventually) and G (always). 

Definition 1 (Syntax). The set of LTL formulae on the set Prop is defined 
hy the grammar ip ::= p \ ~~ ip \ ipy ip \ \ ip\5 ip, where p ranges over Prop. 

The semantics of LTL usually defines whether an execution cr of a given 
system satisfies a formula. Actually the semantics only depends on the atomic 
propositions that stand in each state of cr. Then for our purpose we consider 
only sequences of sets of atomic propositions. 

Definition 2 (Semantics). Let u = uqU\ ... he a word in with E = 

Let if he an LTL formula. The relation u \= ip (u models ip) is defined as follows: 

- u\=pifpG uo, 

- u\=^ipi ifu^ipi, 

- u \= ipi V ip 2 if u\= ipi or u\= ip 2 , 

- u\=Xipi if uiU 2 . . . 1= v^i, 

- u \= ifiE ip 2 i/ > 0, UfcUfc+i . . ,\= ip 2 and \/0 < i < k, UiUi+i ... \= ipi- 

Only basic operators have been defined above. We will of course also use the 
derived operators defined by: 

tt‘^= pV ^p , ff ‘^= -> tt , ipi f\ip2 ‘^= ^(“' V ^ ip2) , (2) 
ipi R ip 2 '=‘^ “'(“' ip\\J ^ ip 2 ) , F ip HE ip and G ip ff R = ^F V? • (3) 
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An LTL formula that is neither a disjunction (V) nor a conjunction (A) is 
called a temporal formula. 

An LTL formula can be written in negative normal form, using only the 
predicates in Prop, their negations, and the operators V, A, X, U, and R. Notice 
that this operation does not change the number of temporal operators of the 
formula. From now on, we suppose that every LTL formula is in negative normal 
form. 

Example 1. Let 9 — ~^{GFp G(<Z — > Fr)) be our running example along the 

paper. The negative normal form of 9 is (ff R (tt U p)) A {tt U (9 A (ff R ^ r))). 

Before any construction our algorithm simplifies the formula, using a set of 
rewriting rules that reduce the number of temporal operators. This is relevant 
since the complexity of our algorithm is based on this number. Some of these 
rules are presented in 0.IH3- We will not discuss them in this paper. 

3 LTL to Very Weak Alternating Antomata (VWAA) 

This section explains a classical construction: building a VWAA from an LTL 
formula. Alternating automata have been introduced by Muller and Schupp in 
0 , 0 , |H|. Then in [0], Rohde defined VWAA as he needed them for a work on 
transfinite words. VWAA were also described in jS|. However, our definition is 
somewhat different from the classical one. 

Definition 3. A co-Biichi very weak alternating co-Biichi automaton is a five- 
tuple A = {Q, A, S, I, F) where: 

— Q is the set of states, 

— Let Q' be the set of conjunctions of elements of Q. The empty conjunction 
is denoted by tt. We identify Q' with 2^ in the following, 

— E is the alphabet, and we let E' = 2^ , 

— 6 : Q ^ 2^ is the transition function, 

— I ff Q' is the set of initial states, 

— F C Q is the set of final states (co-Biichi), 

— there exists a partial order on Q such that Vg S Q, all the states appearing 
in S{q) are lower or equal to q. 

The definition of a classical alternating automaton would be the same except 
for the last condition on the partial order. 

Remark 1. The transition function looks different from the usual definition (Z\ : 
Q X E ^ B'^{Q)). We made those changes for implementation reasons, in order 
to ease the manipulation of the data structures and to save time and space 
during the computation. The classical representation of our transition function 
is given by: 



^(' 7 , a) = \J e . 

(cx,e)eS{q) 

aea 



(4) 
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Fig. 1. Automaton As- Some states (right) are unaccessible, they will be re- 
moved. 



Conversely we may obtain our definition from the classical one, essentially 
by taking the disjunctive normal form. Hence the two definitions are equivalent. 

Notice that in the transition function we use S' instead of A: so that tran- 
sitions that differ only by the action can be gathered. In practice, this usually 
reduces a lot the number of transitions. However the automaton still reads words 
in S‘^. 



Example 2. You can see the representation of a VWAA on Fig. ^ States in F are 
circled twice. Notice that arrows with the same origin represent one transition 
to a conjunction of states. In this example, we have: 

— 1= {GFpAF(gAG-r)}, 

— 5{p) = {(Ap,tt)} where Sp = {a G S \ p G a}, 

— S(GFp) = {(Ap,GFp),(Y,GFpAFp)}. 

A run cr of A on a word uqUi . . . G S‘^ is a labeled DAG (V, E, A) such that : 

OO OO 

— y is partitioned in IJ Vi with E C [j Vi x Vj+i, 

i=0 i—0 

— A: y ^ Q is the labeling function, 

— A(yo) G I and \/x G K, 3(a, e) G <5(A(a;)), Ui G a and e = A(E(x)). 

A run a is accepting if any (infinite) branch in cr has only a finite number of 
nodes labeled in F (co-Biichi acceptance condition). C(A) is the set of words 
on which there exists an accepting run of A. Note that, Biichi and co-Biichi 
acceptance conditions are equivalent for VWAA; one only has to replace F by 
Q\F. 

Example 3. Here is an example of an accepting run of the automaton As '■ 

0 {q,r} {p,q} M {p} 



GFP: 



-*GFp 



-*GFp 



F{qAG~'r) ►F(gAG-'r) ►F(q'AG'^r) 



G-'r 



G^r 
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In the definition of the VWAA associated with an LTL formula, we use two 
new operators. helps treating conjunctions, and ip gives roughly the DNF of ip, 
allowing us to restrict the states of the automaton to the temporal subformulae 
of (f. 

Definition 4 . For Ji, J2 £ 2^ we define 

J\® J2 = {(oi n 02, Cl A 62) I (oi, ei^G Ji jmd (02, 62) G J2}, 

For an LTL formula ip we define ip by: ip = {ip} if ip is a temporal formula, 
ipi /\ip2 = {ei A 62 I 6i G V"! 62 G 1P2} and ipi \/ ip2 = ipi U ip2- 

Here is the first step of our algorithm, building a VWAA from an LTL for- 
mula. Notice that the number of states of this automaton is at most the size of 
the formula. 

Step 1 . Let ip be an LTL formula on a set Prop. We define the VWAA A,p by: 

— Q is the set of temporal subformulae of p, 

— S = 2 P™P, 

— I = p, 

— F is the set of until subformulae of p, that is formulae of type ipiV ip2, 

— 6 is defined as follows {A extends 6 to all subformulae of (p): 

< 5 (tt) = {(A,tt)} 

S{p) = {(Ap,tt)} where Up = {a G E \ p G a} 

S{^p) = {(A^p,tt)} where E^p = E\Ep 
SfXip) = {(A’,e) I e G V'} 

Si'tPi U 1P2) = A{iP2) U {A{iPi) 0 {(A, V'l U V'2)}) 

. 6{ipi R 1P2) = A{ip2) 0 {A{ipi) U {(A, ipi R ^2)}) 

! A{ip) = S{ip) a Ip is a temporal formula 
A{ipi V 1P2) = A{ipi) U A{ip2) 

Aiipi A 1P2) = Aiipi) (g) A(ip2) 

Using the partial order “subformula of” it is easy to prove that Ap is very weak. 



Remark 2. One can notice that the elements of E' used in our definition are 
intersections of the sets A, Ep and E^p. Hence, they can be denoted by con- 
junctions of literals, as in the following examples : p A g A for Ap n Ag n E^r, 
tt for A. Note that intersection and test for inclusion can be easily performed 
with this representation. 



Example 4- Figured shows the result of Step 1 on the formula 9 defined in Ex.d 
Theorem 1 . C{Ap) = {u G A“ \u\= p}. 

Proof. The idea of the proof is to show recursively that for any subformula ip of 
p, the language accepted by A^ with I = ip is equal to {u G E‘^ \ u\= ip}. The 
main difficulties are encountered for ip = ip\\] ip2 (this is where the acceptance 
condition comes into play) and ip = ipi A ip2. □ 
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p A^r 




p A r 
p Aq A-’ r 

B 



q A^r 



Fig. 2. Automaton Gas^ before (left) and after (right) Simplification. 



4 VWAA to Generalized Biichi Automata (GBA) 

At that point we have obtained a VWAA for our LTL formula ip. The problem 
is that the usual method to transform an alternating automaton into a Buchi 
automaton produces an automaton that is much too big. This is why we generate 
first a GBA, which is a Biichi automaton with several acceptance conditions on 
transitions instead of states. 

Definition 5. A generalized Biichi automaton is a five-tuple Q = (Q, A, 8, 1, T) 
where : 

— Q is the set of states, 

— S is the alphabet, and we let S' C 2^ , 

— 6 : Q ^ 2^ is the transition function, 

— I C Q is the set of initial states, 

— T = {Ti, . . . , Tr} where Tj C Q x S' x Q are the accepting transitions. 



Example 5. The automata on Fig. O are examples of GBAs. In these examples, 
r = 2: dashed transitions are in Ti and bold transitions are in T 2 . An accept- 
ing run has to use infinitely many dashed transitions and infinitely many bold 
transitions. 

A run tr of 0 on a word uqUi . . . G A“ is a sequence qo,qi, . . . of elements of 
Q such that qo G I and Vi > 0, 3ai G S' such that Ui G Ui and (ai, qi+i) G S(qi). 
A run cr is accepting if for each 1 < j < r it uses infinitely many transitions from 
Tj. C(G) is the set of words on which there exists an accepting run of G- 

Here is the second step of our algorithm, building a GBA from a co-Biichi 
VWAA. It can be of course applied to any VWAA, and not only to an automaton 
issued from Step 1. Ga bas at most 2 1 1 states and |F| acceptance sets. 

Step 2. Let A = (Q, S,S, I, F) be a VWAA with co-Biichi acceptance condi- 
tions. We define the GBA Ga = (Q' , S!,S' , I,T) where: 
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— Q' = 2'^ is identified with conjunctions of states as explained in Definition 0 

n 

— 6”{qi A . . . A g„) = S{qi), 

i—1 

— 6' is the set of ^-minimal transitions of S" where the relation ^ is defined 
by ^ t if f = (e, a, e'), t' = (e, a',e"), a C a', e" C e', and VT e T, 
t gT => t' gT, 

— T = {Tf \ f G F} where 

Tf = {(e, a, e') \ f ^ e' or 3(/3, e") S S{f), a C /3 and / ^ e" C e'}. 

Remark 3. One may notice that using / ^ e instead oi f ^ e' in the definition 
of T/ would have been more intuitive, since it corresponds to the case where in 
the run of A there is no edge with both ends labeled by /. But our definition is 
also correct. The proof of the following main theorem is more complicated with 
this definition but the experimental results are much better with it, especially 
regarding the simplifications. 



Example 6. Figure Elshows the result of Step 2 on the automaton Ae of Fig. Q 
Theorem 2. C{QjCj = C{A). 

Remark J^. This is the point where we need the alternating automaton to be 
very weak (this theorem is false for classical alternating automata). Consider 
an infinite branch in a run of ^ on a given word : since A is very weak, the 
sequence of the labels on this branch is decreasing, and has to be ultimately 
constant since Q is finite. Then “having only a finite number of nodes labeled in 
F” is equivalent to “having an infinite number of nodes labeled in Q\F” . This 
is crucial in the proof. 

Proof. Let a = (V, E, A) be an accepting run of Al on a word u = uqUi . . . 
V = |JVj,F = [j Ei, with Ei C Vi X Vi+\. We are first going to build a new 

i>0 i>0 

run of A on u, redefining gradually the sets Vj and Ei to V( C Vi and F', Vi > 0. 

Let Cq = Vb- Now suppose that V( has been defined. By definition of a run, 
Vx G y/, 3ax such that Ui G ax, and {ax, ex) G c5(A(x)) where Cx = \{Ei{x)). 
Let a = P\ ax and e = |J e^,. 

By definition of 5", t = {\{Vl),a,e) is in 5"-. there exists a transition t' = 
(A(fA')^ a' , e') in S' such that t' ^ t and t' is minimal. Note that t' is a transition 
of Ga, and that Ui G a C a'. Since t' G 5' C S", Vx G V[, 3(a(,,e(.) G b(A(x)) 
such that a' = H and e(, = |J e'. 

x^V( x^V' 

Moreover Vx G V( such that A(x) = f G F and t' G Tf, there exists (a", e") G 
S{f) such that / ^ e" C e' and a' C a'f. For all other elements x oiVf, let e" = e(, 
and a" = a'x- 

Let C /+1 = {yG Vi+i | A(y) G e'} and F' = {{x,y) G V' x V'^^ \ X{y) G e"}. 
Note that A(F'(x)) = e" since e" C e' = X{Vf) and that E'^{Vl) may be strictly 
contained in 
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Claim. Vi > 0, V/ G F, the following property holds: 

if 3(x,y) G El, X(x) = X{y) = f then 3{x,y) G E„ X(x) = X{y) = f. 

Proof. If Vx G Vf, X{x) yf / then the claim is true. Otherwise 3x G Vf C 
Vi, X{x) = f. Assume that 3y G El(x) with X{y) = f. Then we have 
/ G e", and by definition of e" we deduce that t' ^ Tf. Since t' ^ t we 
have t ^ Tf and we deduce easily that f & ex, which proves the claim. 

Let C' = U V{, E' = [j El and A' be the restriction of A to V . From the 

i>0 i>0 

construction, one can easily see that a' = iy' , E' , A') is a new run of A on u. We 
show first that a' is an accepting run. Suppose that cr' is not accepting: since A 
is very weak, the labels on an infinite branch of a run are ultimately constant. 
Hence if a' is not accepting, then there exists an infinite branch of a' ultimately 
labeled by some f € E. Using the claim, there exists in a an infinite branch 
which is ultimately labeled by /. This is impossible since cr is accepting. 

Let 6i = X{V(), Vz > 0. We have cq = A(Vo) G I and from our construction 
we get Vz > 0, 3ui such that Ui G and {ei,ai,ei+\) is a transition of Qa- 
cr" = Co, 6i, ... is a run of Qa on u. Now let us prove that a" is accepting. Let 
z > 0 and f G E. We intend to prove that at some depth j > i the transition 
(ej,aj,ej+i) is in Tf. 

li f ^ Ci+i then j = i will do. Otherwise let j > z be the smallest depth 
where (/, /) ^ A(if'). Note that j exists, otherwise there would be an infinite 
branch in a' ultimately labeled by / and a' would not be accepting. Since we 
know that / G ej, let x be the node of Vj labeled by /. From our construction 
we know that 3(e",a") G S{f), / ^ e" C ej+i and aj C a". We can conclude 
that (cj, Qfj, Cj+i) is in Tf. 

Therefore, from any accepting run a of A, we have built an accepting run a" 
of Ga on the same word and we get the first inclusion C{A) C C{Ga)- 

Conversely let cr' = eo,ei,... be an accepting run of Ga on a word u = 
uqUi . . . Hence cq G / and Vz > 0, 3ui, Ui G ai and (aj,ei+i) G S'(ei). Let 
V = \J Vi where Vi = {{p,i) \ p G e^} and let X{p, z) = p so that X{Vi) = e^. 

i>0 

By definition of S', \/x G V, 3(ax,ex) G S(X(x)) such that Cx C e^+i and 
ai C ax. Moreover V/ G E, if (e^, Oi, Ci+i) G Tf then either / ^ Cj, or A(a;) = / 
for some x in Vi and in that case we can choose ax and Cx such that f Ox. Let 
E be defined by [x, z/) G if if 3z > 0, a; G U, y G IV+i and A(y) G Ox. 

We can easily see that cr = (V, E, A) is a run of A on u. Now suppose that a 
is not accepting: as we proved before, there would exist in a an infinite branch 
with all nodes ultimately labeled by some f G E. But a' is accepting so it has 
infinitely many transitions in Tf, and for each such transition there is no edge 
in E with both ends labeled by / at the corresponding depth. Hence this is 
impossible. 

Therefore from any accepting run a' of Gaj we have built an accepting run 
cr of Vl on the same word, proving the converse inclusion L{Ga) Q T{A). □ 
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Fig. 3. Automaton after Simplification. 

5 GBA to Biichi Automata (BA) 

At that point we have obtained a GBA for our LTL formula (p. We simply have 
to transform it into a BA to complete our algorithm. This construction is quite 
easy and well-known, but for the sake of completeness we explain it briefly. We 
will begin by defining a BA, using once more the same modifications concerning 
the alphabet and the transition function. 

Definition 6. A Biichi automaton is a five-tuple B = {Q, S, S, /, F) where : 

— Q is the set of states, 

~ E is the alphabet, and we let E' C 2^, 

— 6 : Q ^ 2^ is the transition function, 

— I C Q is the set of initial states, 

— F C Q is the set of repeated states (Biichi condition). 

A run cr of S on a word uqU\ ... € E^ is a sequence qo,qi, . . . of elements of 
Q such that qo G I and Vi > 0, 3ai £ E' such that Ui £ ai and (oi, qi+i) £ h{qi). 
A run a is accepting if there exists infinitely many states in F. C{B) is the set 
of words on which there exists an accepting run of B. 



Here is the third step of our algorithm, building a BA from a GBA. If B has 
n states and r acceptance conditions, then Bg has at most (r -|- 1) x n states. 



Step 3. Let Q = {Q, E, S, I,T) be a GBA with T = {Ti, . . . , T^}. We define the 
BA Bg = {Q X {0, . . . ,r},E, S', I x {0}, Q x {r}) where: 

- ^'((9,j)) = {(a, (?',/)) I (oi,q') £ S{q) and j' = next{j, {q,a,q'))}. 



with 




niax{j < i < r I Vj < A: < z, t G Tk} ii j r 
maxjo <i<r|V0<fc<z, t G Tk} if j = r 



Example 1. Figure0 shows the result of Step 3 on the automaton of Fig. El 



Theorem 3. C{Bg) = C{B). 



Remark 5. There exist many similar algorithms transform a GBA into a BA. 
They often consist in building the synchronous product of the GBA with some 
automaton verifying that every acceptance condition is verified infinitely often. 
This automaton differs from one algorithm to another. We chose one that gives 
good results for the size of the resulting BA after simplification. 
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6 Simplification 

Simplification is really important in our algorithm. Since each step produces a 
new automaton from the result of the previous step, the more we simplify each 
result, the faster our algorithm is and the least memory it uses. 

After each step, we simplify the automaton obtained, using iteratively three 
rules until no more simplification occurs: 

— A state that is not accessible can be removed, 

— If a transition t\ implies a transition t 2 , then t 2 can be removed. 





ti = (g, ai,qi) implies t2 = (5,02,52) if 


In a VWAA, 


02 C oi and 51 C 52 


In a GBA, 


02 C oi, 51 = 52 and 'it £ T , t2 £ T ^ £ T 


In a BA, 


02 C oi and qi = 52 



— If two states qi and q 2 are equivalent, then they can be merged. 





51 and 52 are equivalent if 


In a VWAA, 


< 5 (qi) = <5(52) and qi £ F <(=^ 52 G F 


In a GBA, 


S{qi) = 5(52) and V(o,q') G S{qi), VT G T, 
(51,0, 5') GT (52,0,5') GT 


In a BA, 


5{qi) = 15(52) and qi £ F 52 G F 



Note that for a GBA issued from Step 2, the condition {qi,a,q') G Tj does 
not depend on qi so that the condition simply becomes i5(gi) = 5{q2). 

This simplification procedure is really efficient to reduce the size of the au- 
tomata. But the strength of our algorithm is that the last two simplification rules 
are also used on-the-fly. after a transition has been created, it is compared with 
the other transitions already calculated from the same state, and the ones that 
become useless are immediately deleted; after all the transitions of a state have 
been created, that state is compared with the other states that have already 
been created, and is merged to one of those states if possible. This method is 
important since usually many states and transitions are to be simplified, and 
simplifying them on-the-fly saves a lot of time and space. 

In Table 0 the results of the algorithm with or without on-the-fly simplifi- 
cation are compared (LTL2BA- is our algorithm with a posteriori simplification 
only). For the formula defined in (P), the unsimplified GBA has 2"+^ states, 
whereas the simplified GBA has only 2 states. Using on-the-fly simplification 
avoids the intermediary exponential automaton which explains the great im- 
provement, even if the time and memory used by LTL2BA are still exponential. 

7 Experimental Results 

In this section we compare the results of some recent algorithms transforming 
an LTL formula into a BA. 

Spin is a model-checker developed by Bell Labs since 1980. It contains an 
algorithm transforming an LTL formula into a BA, presented in |3] . The program 
is written in G, and we used version 3.4.1. (released Aug 2000). 
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Wring is an algorithm presented in The program is written in Perl, 
so the comparison with our work cannot be read literally, and the amount of 
memory used is just an approximation we made using the Unix command ‘top’. 

EQLTL is an algorithm presented in jSj . The program is not publicly avail- 
able, but a demo is proposed on the web. All we could do was to measure the 
time needed by the web interface to start responding to our request. We do not 
even know what type of machine handles the request. Consequently the times 
we gave should be taken with caution. 

LTL2BA is a program written in C as Spin, in order to make reliable com- 
parison between the two programs. LTL2BA— is the same program, with a 
posteriori simplification only. 

Tests were made on a Sun Ultra 10 station with 1 GB of RAM. 

As explained in the introduction, we compared the tools on usual LTL for- 
mulae, taking the example of the formula defined in (P|). The result of the 
comparison is detailed in Table H 

Another type of usual LTL formulae, often encountered in model-checking, 
is formulae like: tpn = ^{pi U (p 2 U (. . . U p„) . . .). We made the same tests on 
these fomulae in Table |3 Again our algorithm outperforms the other ones. 



Table 2. Comparison on the Formulae for 2 < n < 8. Time is in sec. Space 
in kB. 





Spin 


Wring 


EQLTL 


LTL2BA 


time 


Space 


time 


Space 


time 


time 


Space 


T2 


0.01 


8 


0.07 


4,100 


8 


0.01 


3.2 


T3 


0.03 


110 


0.29 


4,100 


8 


0.01 


5.5 


Ti 


0.75 


1,700 


1.34 


4,200 


9 


0.01 


11 




43 


51,000 


10 


4,200 


11 


0.01 


13 


Te 


1,200 


920,000 


92 


4,500 


15 


0.15 


25 


IfiY 






720 


6,000 


27 


9.2 


48 












92 


1,200 


93 



We also compared the algorithms on random LTL formulae of a fixed size, 
using a tool presented in HH- For compatibility reasons, the only comparison 
we could realize was between our algorithm and Spin’s. Here the results are 
issued from a test on 200 random formulae of size 10, where both algorithms are 
compared on the same formulae. See Table El for details. 



References 

1. M. Daniele, F. Giunchiglia, and M. Vardi. Improved automata generation for 
linear temporal logic. In Proc. 11th International Computer Aided Verification 
Conference, pages 249-260, 1999. 

2. K. Etessami and G. Holzmann. Optimizing Biichi automata. In Proceedings of 
11th Int. Conf. on Concurrency Theory (CONCUR), 2000. 



Fast LTL to Biichi Automata Translation 



65 



Table 3. Comparison on Random Formulae of a Fixed Size. 





Spin 


LTL2BA 


avg. 


max. 
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Abstract. In formal verification, we verify that a system is correct with respect 
to a specification. When verification succeeds and the system is proven to be 
correct, there is still a question of how complete the specification is, and whether 
it really covers all the behaviors of the system. In this paper we study coverage 
metrics for model checking from a practical point of view. Coverage metrics are 
based on modifications we apply to the system in order to check which parts of it 
were actually relevant for the verification process to succeed. We suggest several 
definitions of coverage, suitable for specifications given in linear temporal logic 
or by automata on infinite words. We describe two algorithms for computing the 
parts of the system that are not covered by the specification. The first algorithm is 
built on top of automata-based model-checking algorithms. The second algorithm 
reduces the coverage problem to the model-checking problem. Both algorithms 
can be implemented on top of existing model checking tools. 



1 Introduction 

In model checking [CE81,QS81,LP85], we verify the correctness of a finite-state system 
with respect to a desired behavior by checking whether a Kripke structure that models 
the system satisfies a specification of this behavior, expressed in terms of a temporal 
logic formula or a finite automaton [CGP99]. Beyond being fully-automatic, an addi- 
tional attraction of model-checking tools is their ability to accompany a negative answer 
to the correctness query by a counterexample to the satisfaction of the specification in 
the system. Thus, together with a negative answer, the model checker returns some er- 
roneous execution of the system. These counterexamples are very important and they 
can be essential in detecting subtle errors in complex designs [CGMZ95]. On the other 
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and by a grant from the Intel Corporation. 
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hand, when the answer to the correctness query is positive, most model-checking tools 
terminate with no further information to the user. Since a positive answer means that the 
system is correct with respect to the specification, this at first seems like a reasonable 
policy. In the last few years, however, there has been growing awareness to the impor- 
tance of suspecting the system of containing an error also in the case model checking 
succeeds. The main justification of such suspects are possible errors in the modeling of 
the system or of the behavior, and possible incompleteness in the specification. 

There are various ways to look for possible errors in the modeling of the sys- 
tem or the behavior. One way is to detect vacuous satisfaction of the specification 
[BBER97,KV99], where cases like antecedent failure [BB94] make parts of the spec- 
ification irrelevant to its satisfaction. For example, the specification if = G{req — > 
F grant) is vacuously satisfied in a system in which req is always false. A similar way 
is to check the validity of the specification. Clearly, a valid specification is satisfied 
trivially, and suggests some problem. A related approach is taken in the process of con- 
straint validation in the verification tool FormalCheck [Kur98], where sanity checks 
include a search for enabling conditions that are never enabled, and a replacement of 
all or some of the constraints by false. FormalCheck also keeps track of variables and 
values of variables that were never used in the process of model checking. 

It is less clear how to check completeness of the specification. Indeed, specifications 
are written manually, and their completeness depends on the competence of the person 
who writes them. The motivation for such a check is clear: an erroneous behavior of the 
system can escape the verification efforts if this behavior is not captured by the speci- 
fication. In fact, it is likely that a behavior that is not captured by the specification also 
escapes the attention of the designer, who is often the one to provide the specification. 

In simulation-based verification techniques, coverage metrics are used in order to 
reveal states that were not visited during the testing procedure (i.e, not “covered” by this 
procedure) [HMA95,HYHD95,DGK96,HH96,KN96,FDK98,MAH98,BH99,FAD99]. 
These metrics are a useful way of measuring progress of the verification process. How- 
ever, the same intuition cannot be applied to model checking because the process of 
model checking visits all states. We can say that in testing, a state is “uncovered” if it 
is not essential to the success of the testing procedure. The similar idea can be applied 
to model checking, where the state is defined as “uncovered” if its labeling is not es- 
sential to the success of the model checking process. This approach was first suggested 
by Hoskote et al. [HKHZ99]. Low coverage can point to several problems. One possi- 
bility is that the specification is not complete enough to fully describe all the possible 
behaviors of the system. Then, the output of a coverage check is helpful in completing 
the specification. Another possibility is that the system contains redundancies. Then, 
the output of the coverage check is helpful in simplifying the system. 

There are two different approaches to coverage in model checking. One approach, 
introduced by Katz et al. [KGG99], states that a well-covered system should closely 
resemble the tableau of its specification, thus the coverage criteria of [KGG99] are 
based on the analysis of the differences between the system and the tableau of its spec- 
ification. We find the approach of [KGG99] too strict - we want specifications to be 
much more abstract than their implementations. In addition, the approach is restricted 
to universal safety specifications, whose tableaux have no fairness constraints, and it is 
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computationally hard to compute the coverage criteria. Another approach, introduced 
in [HKHZ99], is to check the influence of small changes in the system on the satis- 
faction of the specification. Intuitively, if a part of the system can be changed without 
violating the specification, this part is uncovered by the specification. Formally, for a 
Kripke structure K, a state w in K, and an observable signal q, the dual structure 
Kw,q is obtained from K hy flipping the value of q in w (the signal q corresponds to 
a Boolean variable that is true if w is labeled with q and is false otherwise. When we 
say that we flip the value of q, we mean that we switch the value of this variable). For 
a specification ip, Floskote et al. define the set q-cover{K, ip) as a set of states w such 
that does not satisfy p. A state is covered if it belongs to q-cover{K , p) for some 
observable signal q. Indeed, this indicates that the value of g in w is crucial for the 
satisfaction of p in K. It is easy to see that for each observable signal, the set of cov- 
ered states can be computed by a naive algorithm that performs model checking of p 
in Kw,q for each state w of K. The naive algorithm, however, is very expensive, and is 
useless for practical applications In [CKVOl], we suggested two alternatives to the 
naive algorithm for specifications in the branching time temporal logic CTL. The first 
algorithm is symbolic and it computes the set of pairs {w, w') such that flipping the 
value of q in w' falsifies p in w. The second algorithm improves the naive algorithm 
by exploiting overlaps in the many dual structures that we need to check. The two al- 
gorithms are still not attractive: the symbolic algorithm doubles the number of BDD’s 
variables, and the second algorithm requires the development of new procedures. Also, 
these algorithms cannot be extended to specifications in LTL, as they heavily use the 
fixed-point characterization of CTL, which is not applicable to LTL. 

In this paper we study coverage metrics for model checking from a practical point 
of view. First, we consider specifications given as formulas in the linear temporal logic 
LTL or by automata on infinite words. These formalisms are used in many model- 
checking tools (e.g., [HHK96,Kur98]), and we suggest alternative definitions of cov- 
erage, which suit better the linear case. Second, we describe two algorithms for LTL 
specifications. Both algorithms can be relatively easily implemented on top of existing 
model checking tools. 

Let us describe informally our alternative definitions. For a Kripke structure K, let 
JC be the unwinding of K to an infinite tree. Recall that a dual structure is obtained 
in [HKHZ99, CKVOl] by flipping the value of the signal q in the state w of K. A state 
w of K may correspond to many w-nodes in JC. The definition of coverage that refers 
to Kyj^q flips the value of q in all the w-nodes in 1C. We call this structure coverage. 
Alternatively, we can examine also node coverage, where we flip the value of g in a 
single w-node in 1C, and tree coverage, where we flip the value of g in some w-nodes. 
Each approach measures a different sensitivity of the satisfaction of the specification to 



* Hoskote et al. describe an alternative algorithm that is symbolic and runs in linear time, but 
their algorithm handles specifications in a very restricted syntax (a fragment of the universal 
fragment VCTL of CTL) and it does not return the set q-cover{K, p), but a set that corresponds 
to a different definition of coverage, which is sometimes counter-intuitive. For example, the 
algorithm is syntax-dependent, thus, equivalent formulas may induce different coverage sets; 
in particular, the set of states g-covered by the tautology g ^ g is the set of states that satisfy 
g, rather than the empty set, which meets our intuition of coverage. 
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changes in the system. Intuitively, in structure coverage we check whether the value of 
q in all the occurrences of w has been crucial for the satisfaction of the specification. On 
the other hand, in node coverage we check whether the value of q in some occurrence 
of w has been crucial for the satisfaction of the specification^. 

The first algorithm we describe computes the set of node-covered states and is built 
on top of automata-based model-checking algorithms. In automata-based model check- 
ing, we translate an LTL specification to a nondeterministic Biichi automaton A^^p 
that accepts all words that do not satisfy ip [VW94]. Model checking of K with respect 
to p> can then be reduced to checking the emptiness of the product K x A^^p. When K 
satisfies p, the product is empty. A state w is covered iff flipping the value of q in w 
makes the product nonempty. This observation enables us to compute the set of node 
covered states by a simple manipulation of the set of reachable states in the product 
K X A^^p, and the set of states in this product from which a fair path exists. Fortu- 
nately, these sets have already been calculated in the process of model checking. We 
describe an implementation of this algorithm in the tool COSPAN, which is the engine 
of FormalCheck [HHK96,Kur98]. We also describe the changes in the implementation 
that are required in order to adapt the algorithm to handle structure and tree coverage. 

In the second algorithm we reduce the coverage problem to model checking. Given 
an LTL specification p and an observable signal q, we construct an indicator formula 
Indg(p), such that for every structure K and state w in K, the state w is node g-covered 
by p iff w satisfies Indg(p). The indicator formulas we construct are in /i-calculus with 
both past and future modalities, their length is, in the worst case, exponential in the size 
of the specification p, they are of alternation depth two for general LTL specifications, 
and are alternation free for safety LTL specifications. We note that the exponential blow- 
up may not appear in practice. Also, tools that support symbolic model checking of 
/r-calculus with future modalities can be extended to handle past modalities with no 
additional cost [KP95]. In the full version of the paper we show that bisimilar states 
may not agree on their coverage, which is why the indicators we construct require both 
past and future modalities. 

The two algorithms that we present in this paper are derived from the two possible 
approaches to linear-time model checking. The first approach is to analyze the product 
of the system with the automaton of the negation of the property. The second approach 
is to translate the property to a /i-calculus formula and then check the system with 
respect to this formula. Both approaches may involve exponential blow-up. In the first 
approach, the size of the automaton can be exponential in the size of the property, and 
in the second approach the size of the /t-calculus formula can be exponential in the size 
of the property. 

2 Preliminaries 

2.1 Structures and Trees 

We model systems by Kripke structures. A Kripke structure K = {AP, W, R, Wm, L) 
consists of a set AP of atomic propositions, a set W of states, a total transition relation 

^ As we show in Section 2.2, this intuition is not quite precise, and node coverage does not imply 
structure coverage, which is why tree coverage is required. 
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R <Z W X W, an initial state Wm € W, and a labeling function L : W ^ 2^^. 
If R(w, w'), we say that w' is a successor of w. For a state w G W, a w-path tt = 
Wo , . in Ff is a sequence of states in K such that wq = w and for alH > 0, we have 

R{wi, Wi+i). If Wq = Win, the path tt is called an initialized path. The labeling function 
L can be extended to paths in a straightforward way, thus L{tt) = L{wq) ■ L{w\) ■ ■ ■ 
is an infinite word over the alphabet 2'^^. A fair Kripke structure is a Kripke structure 
augmented with a fairness constraint. We consider here the Biichi fairness condition. 
There, K = {AP, W, R, Wm, L, a), where a C IT is a set of fair states. A path of K is 
fair if it visits states in a infinitely often. Formally, let zn/(7r) denote the set of states 
repeated in tt infinitely often. Thus, w G inf{Tr) iff wt = w for infinitely many z’s. 
Then, tt is fair iff inf{Tr) <1 a f id. The language of K, denoted C{K) is the set of 
words L(7 t) for the initialized fair paths tt of K. Often, it is convenient to have several 
initial states in K. Our results hold also for this model. 

For a finite set T, an T -tree T is a set T C T* such that if a; • G T where x € T* 
and V G T, then also x G T. The elements of T are called nodes and the empty word e 
is the root of T. For every x G T, the nodes x ■ v G T where v G T are the children of 
X. Each node a; of T has a direction in T. The direction of the root is some designated 
member of T, denoted by vq. The direction of a node x ■ v is v. We denote by dir{x) 
the direction of node x. A node x such that dir{x) = v is called v-node. A path p of a 
tree T is a set p C T such that e G p and for every x G p there exists a unique v G T 
such that x ■ V G p. For an alphabet S, a S-labeled T-tree is a pair (T, V), where 
T : T — > 27 labels each node of T with a letter from 27. 

A Kripke structure K can be unwound into an infinite computation tree in a straight- 
forward way. Formally, the tree that is obtained by unwinding K is denoted by JC and is 
the 2"^^-labeled IT-tree (T^, V^), where e G and dir{e) = Wm, for all x G 
and V G W with R{dir{x),v), we have x ■ v G , and for all x G , we have 
V^{x) = L{dir{x)). That is, maps a node that was reached by taking the direc- 
tion w to L{w). 

2.2 Coverage 

Given a system and a formula that is satisfied in this system, we check the influence of 
modifications in the system on the satisfaction of the formula. Intuitively, a state is cov- 
ered if a modification in this state falsifies the formula in the initial state of the structure. 
We limit ourselves to modifications that flip the value of one atomic proposition (an ob- 
servable signal) in one state of the structure^. Flipping can be performed in different 
ways. Through the execution of the system we can visit a state several times, each time 
in a different context. This gives rise to a distinction between “flipping always”, “flip- 
ping once”, and “flipping sometimes”, which we formalize in the definitions of structure 
coverage, node coverage, and tree coverage below. We first need some notations. 

For a domain Y, a function T : F — > 2^^, an observable signal q G AP, and a set 
X CY, the dual function Vx.q '■ Y 2^^ is such that Vx,q{x) = V (x) for all x ^ X, 

^ In [CKVOl], we consider richer modifications (e.g., modifications that change both the label- 
ing and the transitions), and show how the algorithms described there for the limited case can 
be extended to handle richer modifications. Extending the algorithms described in this paper 
to richer modifications is nontrivial. 
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Vx,q{x) = y(a;) \ {g} if x £ X and q G V{x), and Vx,q(x) = F(a;) U {g} if x £ X 
and g ^ V{x). When X = {a;} is a singleton, we write Vx,q- For a Kripke structure 
K = {AP, W, R, Win, L), an observable signal g G AP, and a state w G W, we denote 
by Kw,q the structure obtained from K by flipping the value of g in w. Thus, = 
{AP, W, R, Wi„, Lni,q), where Ln,,q{v) = L{v) for v ^w, Ln,,q{w) = L{w) U {g}, in 
case g ^ L{w), and Lxu,q{w) = L{w) \ {g}, in case g G L{w). For X C we denote 
by JCx,q the tree that is obtained by flipping the value of g in all the nodes in X. Thus, 
^x,q = {T^ , fx g)- When X = {x} is a singleton, we write JCx,q- 

Definition 1. Consider a Kripke structure K, a formula ip satisfied in K, and an ob- 
servable signal q G AP. 

- A state w of K is structure g-covered by p iff the structure Kn,,q does not satisfy p. 

- A state w ofK is node g-covered by p iff there is a w-node x in such that JCx.q 
does not satisfy p. 

- A state w of K is tree g-covered by p iff there is a set X of w-nodes in such 
that ICx,q does not satisfy p. 

Note that, alternatively, a state is structure g-covered iff JCx.q does not satisfy p for 
the set X of all w-nodes in JC. In other words, a state w is structure g-covered if flipping 
the value of g in all the instances of w in /C falsifies p, it is node g-covered if a single 
flip of the value of g falsifies p, and it is tree g-covered if some flips of the value of g 
falsifies p. 

For a Kripke structure K = {AP, W, R, Wm, L), an LTL formula p, and an observ- 
able signal g G AP, we use SC{K, p, g), NC{K, p, g), and TC{K, p, g), to denote 
the sets of states that are structure g-covered, node g-covered, and tree g-covered, re- 
spectively in K. 

Membership of a given state w in each of the sets above can be decided by run- 
ning an LTL model checking algorithm on modified structures. For SC{K,p,q), we 
have to model check Ku,,q. For NC{K, p, q) and TC{K, p, g), things are a bit more 
complicated, as we have to model check several (possibly infinitely many) trees. Since, 
however, the set of computations in these trees is a modification of the language of K, 
it is possible to obtain these computations by modifying K as follows. For tree cover- 
age, we model check the formula p in the structure obtained from K by adding a copy 
w' of the state w in which g is flipped. Node coverage is similar, only that we have to 
ensure that the state w' is visited only once, which can be done by adding a copy of K 
to which we move after a visit in w' . It follows that the sets SC{K, p, g), NC{K, p, g), 
and TC{K, p, q) can be computed by a naive algorithm that runs the above checks |IL| 
times, one time for each state w. In Sections 3 and 4 we describe two alternatives to this 
naive algorithm. 

We now study the relation between the three definitions. It is easy to see that 
structure and node coverage are special cases of tree coverage, thus SC{K,p,q) C 
TC{K, p, g) and NC{K, p, q) C TC{K, p, q) for all K, p, and g. The relation be- 
tween structure coverage and node coverage, however, is not so obvious. Intuitively, in 
structure coverage we check whether the value of g in all the occurrences of w has been 
crucial for the satisfaction of the specification. On the other hand, in node coverage we 
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check whether the value of q in some occurrence of w has been crucial for the satis- 
faction of the specification. It may therefore seem that node coverage induces bigger 
covered sets. The following example shows that in that general case neither one of the 
covered sets SC{K, ip, q) and NC{K, ip, q) is a subset of the other. Let K hea Kripke 
structure with one state w, labeled q, with a self-loop. Let pi — Fq. It is easy to see that 
Kw,q does not satisfy pi. On the other hand, 1C is an inhnite tree that is labeled with q 
everywhere, thus ICx,q satishes pi for every node x. So, w is structure g-covered, but 
not node g-covered. Now, let p2 = Gq V G^q. It is easy to see that K^u^q satisfies p2- 
On the other hand, lCx,q is a tree that is labeled with q in all nodes y ^ x, thus JCx,q does 
not satisfy p 2 - So, w is tree g-covered, but it is not structure g-covered. As a corollary, 
we get the following theorem. 

Theorem 1. There is a Kripke structure K, LTL formulas p\ and p2, ond an ob- 
servable signal q such that SG{K,pi,q) NC{K,pi,q) and NC{K,p2,q) % 
SG{K, P 2 ,q)- 



2.3 Automata 

A nondeterministic Bilchi automaton over infinite words is .4 = ( 27 , S, S, So,a), where 
27 is an alphabet. S' is a set of states, i5 : S x 27 — > 2'^ is a transition relation. So C S 
is a set of initial states, and a C S is the set of accepting states. Given an infinite word 
T = (To • CTi • • • in 27 “, a run r of .4 on r is an infinite sequence of states sq, si, S 2 ■ ■ ■ 
such that So G Sq and for all i > 0, we have Sj+i G S{si, ai). The set inf{r) is the 
set of states that appear in r inhnitely often. Thus, s G inf{r) iff Si = s for infinitely 
many z’s. The run r is accepting iff inf{r) n a 7 ^ 0 [Biic62]. That is, a run is accepting 
iff it visits some accepting state infinitely often. The language of A, denoted C{A), is 
the set of inhnite words r G 27 “ such that there is an accepting run of A on r. Finally, 
for s G S,ws dehne .4® = ( 27 , S, S, {s}, a) as A with initial set {s}. 

We assume that specihcations are given either by LTL formulas or by nondetermin- 
istic Biichi automata. It is shown in [VW94] that given an LTL formula p, we can con- 
struct a nondeterministic Biichi automaton Aq, over the alphabet 2^^ such that Aq, ac- 
cepts exactly all the words that satisfy p. Formally, C{Aq) = {r G (2"^^) : r |= p}. 



3 An Automata-Based Algorithm for Computing Coverage 

In this section we extend automata-based model-checking algorithms to hnd the set 
of covered states. In automata-based model checking, we translate an LTL specihca- 
tion p to a nondeterministic Biichi automaton A^q that accepts all words that do not 
satisfy p [VW94]. Model checking of K with respect to p can then be reduced to 
checking the emptiness of the product K x A^q. Let K = {AP, IT, i?, Wm, L) be a 
Kripke structure that satishes p, and let A^q = {2^^ , S, S, So, a) be the nondetermin- 
istic Biichi automaton for ~^p. The product of K with A^q is the fair Kripke structure 
K X A-,q = {AP, W X S,M, {win} X ^o, L', W x a), where M{{w, s), {w' , s')) iff 
R{w, w') and s' G S{s, L{w)), and L'{{w, s)) = L{w). Note that an inhnite path tt in 
K X A^q is fair iff the projection of tt on S' satishes the acceptance condition of A^q. 
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Since K satisfies ip, we know that no initialized path of K is accepted by Hence, 
C,{K X A^ip) is empty. 

Let P C VL X S' be the set of pairs {w, s) such that A-,^p can reach the state s as 
it reads the state w. That is, there exists a sequence (wq, sq), . . . , {wk, Sk) such that 
Wo = Win, So G So, Wk = w, Sk = s, and for all i > 0 we have R{wi,Wi+i) 
and Sj_|_i C S{si, L{wi)). Note that (w,s) G P iff {w,s) is reachable in iC x A^^. 
For an observable signal q G AP and w C W, we define the set Pw,q C VL x S as 
the set of pairs {w' , s') such that w' is a successor of w and A^q, can reach the state 
s' as it reads the state w' in a run in which the last occurrence of w has q flipped. 
Formally, if we denote by Zg : FF — > 2^^ the labeling function with q flipped (that is, 
Lq{w) = L{w) U {g} if g ^ L{w), and Lq{w) = L{w) \ {g} if g G L{w)), then 

Pw,q = {{w' ,s) : there is sG S' such that (w, s) G P, i?(w, and G <5(s, Pg(w))}. 

Recall that a state w is node g-covered in K iff there exists a a w-node x in such 
that JCx,q does not satisfy tp. We can characterize node g-covered states also as follows 
(see the full version for the proof). 

Theorem 2. Consider a Kripke structure K, an LTL formula p, and an observable 
signal q. A state w is node q-covered in K by p iff there is a successor w' of w and a 
state s' such that {w' , s') G Pw,q ond there is a fair {w' , s')-path in K x A^qu. 

Theorem 2 reduces the problem of checking whether a state w is node g-covered to 
computing the relation Pw,q and checking for the existence of a fair path from a state in 
the product K x A^q,. Model-checking tools compute the relation P and compute the 
set of states from which we have fair paths. Therefore, Theorem 2 suggests an easy im- 
plementation for the problem of computing the set of node-covered states. We describe 
a possible implementation in the tool COSPAN, which is the engine of FormalCheck 
[HHK96,Kur98]. We also show that the implementation can be modified in order to 
handle structure and tree coverage. 

In COSPAN, the system is modeled by a set of modules, and the desired behavior 
is specified by an additional module A. The language C{A) is exactly the set of wrong 
behaviors, thus the module A stands for the automaton A^q in cases the specification 
is given an LTL formula p. In order to compute the set of node g-covered states, the 
system has to nondeterministically choose a step in the synchronous composition of the 
modules, in which the value of g is flipped in all modules that refer to g. Note that this 
is the same as to choose a step in which the module A behaves as if it reads the dual 
value of g. This can be done by introducing two new Boolean variables flip and flag, 
local to A. The variable is nondeterministically assigned true or false in each step. 
The variable^flg is initialized to true and is set to false one step after flip becomes true. 
Instead of reading g, the module A reads g 0 (flip A flag). Thus, when both flip and 
flag hold, which happens exactly once, the value of g is flipped (0 stands for exclusive 
or). So, the synchronous composition of the modules is not empty iff the state that was 
visited when flip becomes true for the first time is node g-covered. The complexity of 
model checking is linear in the size of the state space of the model, which is bounded 
by 0(2”), where n is the number of state variables. We increase the number of state 
variables by 2, thus the complexity of coverage computation is still 0(2”). 
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With a small change in the implementation we can also check tree coverage. Since 
in tree coverage we can flip the value of q several times, the variable is no longer 
needed. Instead, we need log \ W\ variables x\, , Xiog \w\ for encoding the state w 
that is now being checked for tree q-coverage. The state w is not known in advance 
and the variables xi,. . . ,a;iog|w| initialized non-deterministically and then kept 
unchanged to maintain the encoding of some state of the system. The variable flip is 
nondeterministically assigned true or false in each step. Instead of reading q, the mod- 
ule A reads q 0 (flip A atjw), where atjw holds iff the encoding of the current state 
coincides with xi, . . . , xiog \w\- Thus, when both flip and atjw hold, which may hap- 
pen several times, yet only when the current state is w, the value of q is flipped. So, the 
synchronous composition of the modules is not empty iff the state that was visited when 
flip becomes true for the first time is tree g-covered. Finally, by nondeterministically 
choosing the values of a;i, . . . , Xiog \w\ the first step of the run and fixingflip to true, 
we can also check structure coverage. 

The complexity of coverage computation for tree and structure coverage is a func- 
tion of the size of the state space, which is at most exponential in the number of state 
variables. For both tree and structure coverage, we double the number of variables by 
introducing n new variables that encode the flipped state. Thus, the state-space size is 
0(2^") instead of 0(2"). While symbolic algorithms may have the same worst-case 
complexity as enumerative algorithms, in practice they are typically superior for many 
classes of applications. We believe that there is an ordering of the BDD variables that 
would circumvent the worst-case complexity. On the other hand, the naive approach 
always require 2" model-checking iterations. Thus, our algorithm is likely to perform 
better than the naive approach. 

In our definitions of coverage we assumed that a change in the labeling of states 
does not affect the transitions of the system. This is why the transitions of the modules 
that model the behavior of the system remain unchanged when flipping happens. A 
different definition, which involves changes in the transition relation is required when 
we assume that the states are encoded by atomic propositions in AP and the transition 
relation is given as a relation between values of the atomic propositions in AP. Then, 
flipping <7 in a state w causes changes in the transitions to and from w [CKVOl]. Thus, 
in this case it is not enough to change the module A in order to compute the covered 
sets and we also have to change the modules of the system. This can be achieved by 
defining fhe variables flip and flag globally, and referring fo their value in all modules 
of the system. This involves a broader change in the source code of the model. 

Note that our algorithm is independent of the fairness condition being Biichi, and it 
can handle any fairness condition for which the model-checking procedure supports the 
check for fair paths. Also, it is easy to see that the same algorithm can handle systems 
with multiple initial states. 



4 Indicators for LTL Formulas 



In this section we reduce the computation of node g-covered sets to model checking. 
Given an LTL formula ip and an observable signal q, we want to find an indicator 
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formula for tp that distinguishes between the covered and uncovered states in all Kripke 
structures. Formally, we have the following. 

Definition 2. Given an LTL formula tp and an observable signal q, an indicator for p 
and q is a formula Indqipp) such that for all Kripke structures K that satisfy p, we have 

{w & W :w\= Indq{p)} = NC{K,p,q). 

The motivation of indicators is clear. Once Indq {p) is found, global model-checking 
procedures can return the set of node g-covered states. 

We show that for LTL formulas, we can construct indicators in the full p-calculus, 
where we allow both future and past modalities (see [Koz83] for a definition of p- 
calculus with future modalities). Formally, the full p-calculus for a set AP of atomic 
propositions and the set Var of variables includes the following formulas: 

- true, false, p, for p C AP, and y, for y C Var. 

- ~^pi. Pi V P 2 , and Pi A p 2 for full p-calculus formulas pi and p 2 - 

- AXp, EXp, AY p, and EY p for full p-calculus formula p. 

- py.p{y) and vy.p{y), where y C Var and is a full /r-calculus formula monotone 
in y. 

A sentence is a formula that contains no free atomic proposition variables. The 
semantics of full p-calculus sentences is defined with respect to Kripke structures. The 
semantics of the path quantifiers A (“for all paths”) and E (“there exists a path”), and 
the temporal operators X (“next”), and Y (“yesterday”) assumes that both future and 
past are branching [KP95]. That is, for a state w, we have w |= AXp iff for all v such 
that R{w, v), we have v \= p, and w ^ AY ip iff for all u such that R{u, w), we have 
u \= Ip. We assume that the initial states of the Kripke structure are labeled with a 
special atomic proposition init (wq |= AVfalse and init ^ AVtrue). 

The construction of Indq{p) proceeds as follows. We first construct a formula, de- 
noted E, that describes The formula S' is a disjunction of formulas ips, for states 
s of A^ip, and it describes states of A^q, that participate in an accepting run of A^q,. 
For each state s, the formula ips is the conjunction of two formulas. Reach s and Accg, 
defined as follows. 

- The formula Reachg is satisfied in a sfate re of a Kripke structure K iff there exists 
a run of A^q, on an initialized path of K that visits the state s as it reads w. 

- The formula AcCg is satisfied in a sfate w of a Kripke structure K iff there exists 
an accepting run of A%q, on a w-path of K (recall that A%q, is defined as A^q, wifh 
initial sef {s}). 

Then, E = Vses Reachg A AcCg. So, for every Kripke sfruefure K, a state w 'm K 
satisfies E iff there exists a state s G S such that there exists an accepting run of A^q 
on an initialized path of K that visits the state s as it reads w. The formulas Reachg refer 
to the past and are constructed as in [HKQ98] using past modalities. The formulas AcCg 
refer to the future and are constructed as in [EL86,BC96], using future modalities 

The algorithms in [HKQ98,EL86,BC96] construct p-systems of equational blocks, and not p- 
calculus formulas. The translation from p-formulas to p-systems may involve an exponential 
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Note that K ^ ip ifi there exists w & W such that w \= Since K satisfies p, 
there is no state w G W that satisfies S'. Our goal is to find the node g-covered states 
of K. These are the states that satisfy W after a flip of the value of q in them In order 
to simulate such a flip, we have to separate between the part that describes present 
behavior, and the parts that describe past or future behavior in the formulas Reachg 
and AcCs , respectively. For that, we first replace all /x-calculus formulas by equivalent 
guarded formulas. A /r-calculus formula is guarded if for all y G Var, all the occurrences 
of y are in the scope of A" or F [BB87]. It is shown in [KVWOO] that given a /x-calculus 
formula, we can construct an equivalent guarded formula in linear time. Then, in order 
to separate the part that describes present behavior, we replace each formula yy-f{y) by 
the equivalent formula f{yy-f{y))- For example, the formula py.p V AXy is replaced 
by p\/ AX fxy.pV AX y. In fact, when constructed as in [HKQ98], the formulas Reachs 
are already pure-past formulas, they do not refer to the present, and the above separation 
is required only for the formulas Accg. 

We can now complete the construction of the indicators. We distinguish between 
two cases. In the first case, w is labeled q and we check whether changing the label to 
^q creates an accepting run of A^^p. In the second case, w is labeled ^q and we check 
whether changing the label to q creates an accepting run of A^^p. Let NC^{K, p, q) 
be the set of node g-covered states of K for p and q that are labeled with q, that is, 
NC^{K,p,q) = NC{K,p,q) {w G W '■ q G L{w)}. Let be the formula 
obtained from 'R by replacing with q each occurrence of ~^q that is not in the scope 
of a temporal operator. A state w G W satisfies iff there exists a state s G S 
and an accepting run of A^^p on an initialized path of K that visits the state s as it 
reads w with the value of q flipped. Thus, the set NC^ (K, p, q) is exactly the set 
{w G W : w \= 1?'+}. In the similar way we can define NC~{K,p,q) as the set 
NC{K, p,q) D {w G W '■ q ^ L{w)}, and the formula that is obtained from d/ by 
replacing with ^q each positive occurrence of q that is not in the scope of a temporal 
operator. The set NC~{K,p,q) is exactly the set {w € IF : w |= Now, the 
indicator formula for p is Indq{p) = V . 

Theorem 3. Given an LTL formula p and an observable signal q, there exists a full p- 
c ale ulus formula Indqijp) of size exponential in p such that for every Kripke structure 
K, the set of node-uncovered states of K with respect to p and q is exactly the set of 
states of K that satisfy Indq{p). 

As discussed in [HKQ98,EL86,BC96], W has alternation depth 2 (alternation is re- 
quired in order to specify Biichi acceptance) and is alternation free if is a safety 
formula (then, A^qn can be made an automaton with a looping acceptance condition 
[Sis94]). The size of the automaton A^q, is exponential in the size of the formula 
p [VW94], and the size of the formulas Reachs and AcCs is linear in the size of 
A^q,. Hence, the size of indicator formula Indq{p) is exponential in the size of p. 

blow-up. While the our algorithm is described here in terms of /i-calculus formulas, we can 
work with /r-systems directly. The operators A and V on /i-formulas are defined on the systems 
of equational blocks as well. 

^ This semantics naturally translates to node coverage. For structure and tree coverage other 
definitions are needed. 
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We note that the exponential blow-up may not appear in practice [KV98,BRS99]. Since 
the semantics of /i-calculus with past modalities refers to structure, rather than trees 
(that is, the past is branching), model checking algorithms for /i-calculus with only fu- 
ture modalities can be modified to handle past without increasing complexity [KP95]. 
Model-checking complexity K \= for a /t-calculus formula ip with alternation depth 
2 is quadratic in \K\ ■ |r/)| [EL86]. For alternation-free /i-calculus, the complexity is 
linear [CS91]. So, the complexity of finding the covered set using our reduction is 
{\K\ • for general LTL properties and is 0(|iC| • for safety properties. 

Remark 1. Two-way bisimulation extends bisimulation by examining both successors 
and predecessors of a state [HKQ98]. Two states w and w' are two-way bisimilar iff 
they satisfy the same full /t-calculus formulas. Since indicators are full /t-calculus for- 
mula, it follows that if w and w' are two-way bisimilar, they agree on the value of the 
indicator formula, thus w is node g-covered iff w' is node g-covered. In other words, the 
distinguishing power of node coverage is not greater than that of two-way bisimulation. 
In the full version we show that node coverage can distinguish between one-way bisim- 
ilar states. Thus, the use of full /t-calculus is essential for the construction of indicators. 
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Abstract. In this paper we propose an efficient algorithmic solution to 
the problem of determining a Bisimulation Relation on a finite structure. 
Starting from a set-theoretic point of view we propose an algorithm that 
optimizes the solution to the Relational coarsest Partition problem given 
by Paige and Tarjan in 1987 and its use in model-checking packages is 
briefly discussed and tested. Our algorithm reaches, in particular cases, 
a linear solution. 

Keywords: Bisimulation, non well-founded sets, automata, verification. 



1 Introduction 

It is difficult to accurately list all the fields in which, in one form or another, the 
notion of bisimulation was introduced and now plays a central role. Among the 
most important ones are: Modal Logic, Concurrency Theory, Formal Verification, 
and Set Theory. 

Several existing verification tools make use of bisimulation in order to min- 
imize the state spaces of systems description. The reduction of the number of 
states is important both in compositional and in non-compositional model check- 
ing. Bisimulation serves also as a means of checking equivalence between tran- 
sition systems. The verification environment XEVE Q provides bisimulation 
tools which can be used for both minimization and equivalence test. In gen- 
eral, in the case of explicit-state representation, the underlying algorithm used 
is the one proposed by Kanellakis and Smolka while Bouali and de Simone 
algorithm Q is used in the case of symbolic representation. The Concurrency 
Factory project ^ tests bisimulation using techniques based on the Kanellakis 
and Smolka algorithm. As for the criticism on the use of bisimulation algorithms, 
Fisler and Vardi observe in Q that “bisimulation minimization does not ap- 
pear to be viable in the context of invariance verification” , but in the context of 
compositional verification it “makes certain problems tractable that would not 
be so without minimization” BQ. 

The first significant result related to the algorithmic solution of the bisim- 
ulation problem is in where Hopcroft presents an algorithm for the mini- 
mization of the number of states in a given finite state automaton. The problem 
is equivalent to that of determining the coarsest partition of a set stable with 

G. Berry, H. Comon, and A. Finkel (Eds.): CAV 2001, LNCS 2102, pp. 2001. 

© Springer-Verlag Berlin Heidelberg 2001 



80 



Agostino Dovier, Carla Piazza, and Alberto Policriti 



respect to a finite set of functions. A variant of this problem is studied in 
where it is shown how to solve it in linear time in case of a single function. 
Finally, in Paige and Tarjan solved the problem for the general case (i.e., 
bisimulation) in which the stability requirement is relative to a relation E (on a 
set N) with an algorithm whose complexity is 0{\E\ log|A^|). 

The main feature of the linear solution to the single function coarsest par- 
tition problem (cf. ^3), is the use of a positive strategy in the search for the 
coarsest partition: the starting partition is the partition with singleton classes 
and the output is built via a sequence of steps in which two or more classes 
are merged. Instead, Hopcroft’s solution to the (more difficult) many functions 
coarsest partition problem is based on a (somehow more natural) negative strat- 
egy: the starting partition is the input partition and each step consists of the 
split of all those classes for which the stability constraint is not satisfied. The 
interesting feature of Hopcroft’s algorithm lies in its use of a clever ordering (the 
so-called “process the smallest half” ordering) for processing classes that must 
be used in a split step. Starting from an adaptation of Hopcroft’s idea to the 
relational coarsest partition problem, Paige and Tarjan succeeded in obtaining 
their fast solution ^3. The algorithm presented in [)] is based on the naive 
negative strategy, but on each iteration it stabilizes only reachable blocks with 
respect to all blocks. This is improved in [y^, where only reachable blocks are 
stabilized with respect to reachable blocks only. 

In this paper we present a procedure that integrates positive and negative 
strategies to obtain the algorithmic solution to the bisimulation problem and 
hence to the relational coarsest partition problem. The strategy we develop is 
driven by the set-theoretic notion of rank of a set. The algorithm we propose 
uses 33 and |3 subroutines and terminates in linear time in many cases, 
for example when the input problem corresponds to a bisimulation problem on 
acyclic graphs (well-founded sets). It operates in linear time in other cases as 
well and, in any case, it runs at a complexity less than or equal to that of the 
algorithm by Paige and Tarjan ^3- Moreover, the partition imposed by the rank 
allows to process the input without storing the entire structure in memory at 
the same time. 

The paper is organized as follows: in the next section we introduce the set- 
theoretic formulation of the bisimulation problem. The subsequent Section 3 
contains the algorithm for the well-founded case. Section^presents the basic idea 
of our proposed algorithm and its optimizations are explained in the following 
section. In Sectionjwe show how our results and methods can be adapted to the 
multi-relational coarsest partition problem (i.e., bisimualtion on labeled graphs) 
and in Section^we discuss some testing results. Some conclusions are drawn in 
Section^ Detailed proofs of all the statements in this paper can be found in Q. 

2 The Problem: A Set-Theoretic Perspective 

One of the main features of intuitive (naive) Set Theory is the well-foundedness 
of membership. As a consequence, standard axiomatic set theories include the 
foundation axiom that forces the membership relation to form no cycles or infi- 
nite descending chains. In the 80’s the necessity to consider theories that do not 
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assume this strong constraint (re-)emerged in many communities; hence vari- 
ous proposals for (axiomatic) non well-founded set theories (and universes) were 
developed (see 

Sets can be seen as nothing but accessible pointed graphs (cf. Definition^- 
Edges represent membership, m ^ n means that m has n as an element, and 
the nodes in the graph denote all the sets which contribute in the construction 
of the represented set. 

Definition 1. An accessible pointed graph (apg) (G, n) is a directed graph G = 
{N, E) together with a distinguished node n € N such that all the nodes in N 
are reachable from n. 

The resulting set-theoretic semantics for apg’s, introduced and developed in [jj, 
is based on the natural notion of picture of an apg. The extensionality axiom — 
saying that two objects are equal if and only if they contain exactly the same 
elements — is the standard criterion for establishing equality between sets. If 
extensionality is assumed it is immediate to see that, for example, different 
acyclic graphs can represent the same set. However, extensionality leads to a 
cyclic argument (no wonder!) whenever one tries to apply it as a test to establish 
whether two cyclic graphs represent the same non well-founded set {hyperset). 
To this end a condition {bisimulation) on apg’s can be stated in accordance with 
extensionality: two apg’s are bisimilar if and only if they are representations of 
the same set. 

Definition 2. Given two graphs Gi = {Ni,Ei) and G 2 = {N^^Ef), a bisimu- 
lation between G\ and G 2 is a relation b C Ni x N 2 such that: 

1. uibu2 A {ui,vi) e El ^ 3 v2 e N2{vi bv2 A (^ 2 ,^ 2 ) G E 2 ) 

2. uibu2 A (u2, U 2 ) G E 2 ^ 3vi G Ni{vi bv2 A {ui,Vi) G Ei). 

Two apg’s (Gi,ni) and (G 2 ,n 2 ) are bisimilar if and only if there exists a bisim- 
ulation b between G\ and G 2 such that ni bn 2 - 

We can now say that two hypersets are equal if their representations are 
bisimilar. For example the apg (({n},0,),n) represents the empty set 0. The 
hyperset 17, i.e. the unique hyperset which satisfies the equation x = {a;} (see fH), 
can be represented using the apg (({n}, {(n, n)}),n). Any graph such that each 
node has at least one outgoing edge can be shown to be a representation of 
17. It is clear that for each set there exists a collection of apg’s which are all 
its representations. It is always the notion of bisimulation which allows us to 
find a minimum representation (there are no two nodes representing the same 
hyperset). Given an apg {G,n) that represents a set S, to find the minimum 
representation for S it is sufficient to consider the maximum bisimulation = 
between G and G. Such a bisimulation = always exists and is an equivalence 
relation over the set of nodes of G. The minimum representation of S is the apg 
(G/ =, [n]) (see Q]) which is usually called bisimulation contraction of G. 

An equivalent way to present the problem is to define the concept of bisim- 
ulation as follows. 
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Definition 3. Given a graph G = {N,E), a bisimulation on G is a relation 
b C N X N such that: 

1. ui bu2 A (ui,vi) & E => 3 v2{vi bv2 A (^ 2 ,^ 2 ) G E) 

2. uibu2 A {u2,V2) & E ^ 3vi{yi bv2 A {u\,vi) G E). 

A bisimulation on G is nothing but a bisimulation between G and G. The prob- 
lem of recognizing if two graphs are bisimilar and the problem of determin- 
ing the maximum bisimulation on a graph are equivalent. Two disjoint apg’s 
{{Ni, El), vi) and {{N 2 , E 2 ), V 2 ) are bisimilar if and only if vi = V 2 , where = is 
the maximal bisimulation on ((A^i U A ^2 U {p}, U A 2 U {{p, vi), {p, 1 ^ 2 )}), ^), 
with p a new node. We consider the problem of finding the minimum graph 
bisimilar to a given graph, that is, the bisimulation contraction of a graph. 

The notion of bisimulation can be connected to the notion of stability: 

Definition 4. Let E he a relation on the set N , E~^ its inverse relation, and P 
a partition of N . P is said to be stable with respect to E iff for each pair Bi, B 2 
of blocks of P, either B\ C E~^{B 2 ) or B\ n E~^{B 2 ) = 0. 

Given a set N, k relations Ei, Ek on N, and a partition P of N, the multi- 
relational coarsest partition problem consists of finding the coarsest refinement 
of P which is stable with respect to Ai, . . . , Afc. As noted in ^3, the algorithm 
of 13 that determines the coarsest partition of a set N stable with respect to 
k relations solves exactly the problem of testing if two states of an observable 
Finite States Process (FSP) are strongly equivalent. Our bisimulation problem 
is a particular case of observable PSPs strong equivalence problem (k = 1). 
In Section H we show how the case of bisimulation over a labeled graph {multi- 
relational case) can be linearly reduced to our bisimulation problem. This means 
that the problem of finding the bisimulation contraction of a graph is equivalent 
to the multi-relational coarsest partition problem. 



3 The Well-Founded Case 

We start by considering the case of acyclic graphs (well-founded sets) . Similarly 
to what is done in the minimization of Deterministic Finite Automata, it is 
possible to to determine the coarsest partition P stable w.r.t. E through the 
computation of a greatest fixpoint. A “negative” (and blind with respect to 
the relation) strategy is applicable: start with the coarsest partition P = {A^}, 
choose a class B (the splitter) and split all the classes using B whenever P is not 
stable. The complexity of the algorithm, based on a negative strategy, presented 
in ^3 for this problem is 0{\E\ log |fV|). 

We will take advantage of the set-theoretic point of view of the problem in 
order to develop a selection strategy for the splitters depending on the relation 
E. Making use of the ordering induced by the notion of rank we will start from 
a partition which is a refinement of the coarsest one; then we will choose the 
splitters using the ordering induced by the rank. These two ingredients allow to 
obtain a linear-time algorithm. 
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Definition 5. Let G = {N, E) be a directed acyclic graph. The rank of a node 
n is recursively defined as follows: 

J rank (n) = 0 if n is a leaf 

1 rank{n) = 1 + max{ ranfc(m) : (n, m) G E} otherwise 

The notion of rank determines a partition which is coarser than the maximum 
bisimulation. 

Proposition 1. Let m and n be nodes of an acyclic graph G. Lf m = n, then 
rank(rn) = rank{n). 

The converse, of course, is not true. Let P be a partition of N such that for each 
block P in P it holds that m,n G B implies rankfm) = rank(n)', then every 
refinement of P fulfills the same property. Hence, we can assign to a block B the 
rank of its elements. 

Algorithm 1 (Well-Founded Case). 

1. for n G N do compute rank{n); — compute the ranks 

2. p max{ranA:(n) : n G N}; 

3. for i = 0, . . . , p do Bi ■.= {n G N ■. rank(n) = i}; 

4- P {Bi i = 0, . . . , p}; — P is the partition to be refined initialized with the Bds 
5. for i = 0, . . . , p do 

(a) Di ■.= {X G P '■ X G Bi} ; — determine the blocks currently at rank i 

(b) for A e Pi do 

G ~ collapse(G, X); — collapse nodes at rank i 

(c) for n G N C\ Bi do — refine blocks at higher ranks 

for C G P and C C Bi+\ U . . . U Pp do 

P:=(P\ {G}) U {{m e G : (m, n) G E},{mGC : (m, n) i E}}; 

Step 1 can be performed in time 0(|A| + |P|) by a depth- first visit of the 
graph. Collapsing nodes ai,...,ak, as in step 5(5), consists in eliminating all 
nodes but oi and replacing all edges incident to C 2 , . . . , Cfc by edges incident to 
a\. Despite the nesting of for-loops the following holds. 

Proposition 2. The algorithm for the well-founded case correctly computes the 
bisimulation contraction of its input acyclic graph G = (A, E) and can be im- 
plemented so as to run in linear time 0(|A| -|- |P|). 

An example of computation of the above algorithm can be seen in Figure^ In 




Fig. 1. Minimization Process. 

all the examples we present, the computation steps proceed from left to right. 
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Those who are familiar with OBDDs (Q) or with fc-layered DFA’s can 
read our algorithm for the well-founded case as a generalization of the minimiza- 
tion algorithm for fc-layered DFA. In the well-founded case we admit that a node 
at the i-th layer may reach a node at the j-th layer with j > i. 



4 Basic Idea for the General Case 

The presence of cycles causes the usual notion of rank (cf. Definition H to be 
not adequate: an extension of such a notion must be defined. 

Definition 6. Given a graph G = (N,E), let be the graph 

obtained as follows: 

j\j^scc = : c ig a strongly connected component in G} 

£;s'=c _ {(ci, C 2 ) : Cl yf C 2 and 3m G Ci, ri 2 G C 2 ((ui, m) G E)} 

Given a node n G N , we refer to the node of G®“ associated to the strongly 
connected component of n as c(n). 

Observe that G®°° is acyclic and if G is acyclic then G®°° is G itself. 

We need to distinguish between the well-founded part and the non-well- 
founded part of a graph G. 

Definition 7. Let G = {N,E) and n G N. G{n) = {N{n),E f N{n)) is the 
subgraph of G of the nodes reachable from n. WE(G), the well-founded part of 
G, is WE{G) = {n G N : G{n) is acyclic}. 

Observe that {G{n),n) is an apg; if n G WE{G) then it denotes a well- 
founded set. 

Definition 8. Let G = {N, E) . The rank of a node n of G is defined as: 

0 if n is a leaf in G 

—00 if c{n) is a leaf in G®“ and n is not a leaf in G 

max({l -I- rank{m) : {c{n),c{m)) G m G WE{G)} U 

{rank{m) : {c{n),c{m)) G E^^‘^,m ^ WE{G)}) otherwise 

Since G®°° is always acyclic, the definition is correctly given. If G is acyclic then 
G = G®'^'^ and the above definition reduces to the one given in the well-founded 
case (Def.H. Nodes that are mapped into leaves of G®'^'^ are either bisimilar to 
0 or to the hyperset f2. For a non- well- founded node different from f2 the rank 
is 1 plus the maximum rank of a well-founded node reachable from it (i.e., a 
well-founded set in its transitive closure). 

We have explicitly used the graph G®°° to provide a formal definition of the 
notion of rank. However, the rank can be computed directly on G by two visits 
of the graph, avoiding the explicit construction of G®'^'^. 



ranfc(n) = 
rankfn) = 
rankfn) = 
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Proposition 3. Let m and n he nodes of a graph G: 

1. m=f2if and only if rank{m) = —oo; 

2. m = n implies rank(m) = rank(n). 

The converse of Proposition ^2 is not true. Moreover, the rank of c{n) in 
(that can be computed using Def. | is not necessarily equal to the rank of 
n in G. 

Given a graph G = {N,E) with p = max{ ranfc(n) : n G N}, we call the 
sets of nodes B-ao, Bq, ■ ■ ■ , Bp^ where Bi = {n G N : rank{n) = i}, the rank 
components of G. 

Since we proved in the previous section that the bisimulation contraction 
can be computed in linear time on well-founded graphs, it is easy to see that we 
can use the algorithm for the well-founded case in order to process the nodes in 
WF{G) for the general case. Hence, we can assume that the input graph for the 
general case does not contain two different bisimilar well-founded nodes. 

Algorithm 2 (General Case). 

1. for n G N do compute rank(n); — compute the ranks 

2. p := max{ranA:(n) : n G N}; 

3. for i = — 00 , 0, . . . , p do Bi ■.= {n G N ■. rank(n) = i}; 

4- P {Bi : i = — oo, 0, . . . , p}; — P partition to be refined initialized with the Bds 

5. G := collapse(G, B-oo); — collapse all the nodes of rank — oo 

6. for n G N Cl B-oo do — refine blocks at higher ranks 

for C G P and C 7 ^ B -00 do 

P ■-{P\ {G}) U {{m gC :{m,n) G E}, [m G C : (m, n) ^ E}}; 

7. for i = 0, . . . , p do 

(a) Di ■.= {X G P '■ X G Bi] ; — determine the blocks currently at rank i 
Gi := {Bi, E \ Bi); — isolate the subgraph of rank i 

Di := Paige- Tarjan(Gi, Df); — process rank i 

(b) for X GDi do 

G \= collapse(G, A); — collapse nodes at rank i 

(c) for n G N D Bi do — refine blocks at higher ranks 

for G G P and G C Bi+\ U . . . U Bp do 

P := (P \ {G}) U {{m gG :{m,n)G E], {m G G : (m, n) i E}}; 

In steps 1-4 we determine the ranks and we initialize a variable P representing 
the computed partition using the ranks. The collapse operation (steps 5 and 7(5)) 
is as in the well-founded case. Splits of higher rank blocks is instead done in steps 
6 and 7(c). Step 7 is the core of the algorithm, where optimizations will take 
place. For each rank i we call the procedure of ^3 on Gi = {Bi, E \ Bf), with a 
cost 0{\E \ Bi \ log \Bi\) and we update the partition P on nodes of rank greater 
than i. From these observations: 

Proposition 4. If G = {N, E) is a graph, then the worst case complexity of the 
above algorithm is 0(|if| log \N\). The algorithm for the general case on input G 
correctly computes the hisimulation contraction of G. 
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Proof. (Sketch) The global cost is no worse than (for some ci, C 2 G N): 

ci(|fV| + |£;|) + ^C 2 (|if r log m) = 0{\E\ log|fV|). (1) 

i=l 

The complexity of the method sketched above is asymptotically equivalent to 
that of Paige and Tarjan. However, as for the well-founded Algorithm^ we take 
advantage of a refined initial partition and of a selection strategy of the blocks 
to be setected for splitting blocks of higher ranks. In a single rank, the negative 
strategy of the Paige-Tarjan algorithm is applied to the rank components which, 
in general, are much smaller than the global graph. In particular, for families of 
graphs such that p is 0 (| A^|) and the size of the each rank component is bounded 
by a constant c the global cost becomes linear (cf. formula fl). 

5 Optimizations in the General Case 

We present here two situations in which we are able to optimize our algorithm. 
In some cases, a linear running time is reached. Other possible optimizations are 
presented in Q. 

First Optimization. This optimization makes use of the Paige- Tarjan-Bonic 
procedure Q. Such a procedure can be used in some cases to solve the coarsest 
partition problem in linear time adopting a “positive” strategy. Its integration 
in our algorithm produces a global strategy that can therefore be considered as 
a mixing of positive and negative strategies. 

Definition 9. A node n belonging to a rank component Bi C N is said to be a 
multiple node if \{m G Bi : {n,m) G E}\ > 1. 

Whenever Bi has no multiple nodes, we can replace the call to Paige-Tarjan 
in step 7(a) with a call to Paige- Tarjan-Bonic. This allows us to obtain a linear 
time performance at rank i (in the formula B the term C 2 (\E ( Bi \ log \Bi\) can 
be replaced by c^dE ( Bi \ + \Bi\) for some C 3 G N). 

Proposition 5. The optimized algorithm for the general case on input G cor- 
rectly computes the bisimulation contraction ofG. If G = (N,E) is a graph with 
no multiple nodes, then its worst case complexity is 0(|fV| + |£^|). 

In Figure^we show an example of a graph on which the above optimization 
can be performed and the overall algorithm turns out to be linear. 

Second Optimization. The crucial consideration behind the second optimization 
we propose is the following: the outgoing edges of a node u allow one to establish 
to which other nodes of the same rank component it is bisimilar. If we have some 
means to know that u is not bisimilar to any other nodes of its rank component, 
we can simply delete all edges outgoing from u. The deletion of a set of edges 
splits a rank component (i.e., we can recalculate the rank) and makes it possible 
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Fig. 2. Example of the First Kind of Optimization, 
to recursively apply our algorithm on a simpler case. The typical case in which 
the above idea can be applied occurs when, at a given iteration i, there exists a 
block X in the set Di of the blocks of rank i which is a singleton set {n}: then 
all the outgoing edges from the node n can be safely deleted. In next section we 
show the usefulness of this optimization in cases coming from formal verification. 



6 Labeled Graphs 

In several applications (e.g., Concurrency, Databases, Verification) graphs to be 
tested for bisimilarity have labels on edges (typically, denoting actions) and, 
sometimes, labels on nodes (typically, stating a property that must hold in a 
state). If only edges are labeled, we are in the context of the multi-relation 
coarsest partition problem. The definition of bisimulation has to be refined in 
order to take into consideration the labels on nodes and the labels on edges. 

Definition 10. Let L he a finite set of labels and A be a finite set of aetions. 
Given a labeled graph G = {N, E, t) , with E G N x A x N (we use u v G E 
for (u, a,v) G E) and £ : N — L, a labeled bisimulation on G is a symmetric 
relation b C N x N such that: 

• ifu\bu2, then £{u\) = £{u2); 

• if ui bu2 and ui vi G E, then there is an edge U2 V2 G E and vi bv2- 

Let us analyze how our algorithm can solve the extended problem. To start, 
assume that only nodes are labeled. The only change is in the initialization 
phase: the partition suggested by the rank function must be refined so as to 
leave in the same block only nodes with the same label. Then the algorithm can 
be employed without further changes. Assume now that edges can be labeled. 



m n ^ m — > p, — > n 

\£(^)\ \£{n)\ \£(m)\ \£(p) = jm,d^ \£{n)\ 

Fig. 3. Removing Edges Labels. 

We suggest the following encoding: for each pair of nodes m, n and for each label 
a such that there is an edge m n G E (see also Fig.^: 

— remove the edge m n; 

— add a new node p, labeled by the pair (m, a); 

— add the two (unlabeled) new edges m ^ p, p ^ n. 
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Starting from G = {N, E, t) we obtain a new graph G' = {N' , E' , i), with E' C 
N X N, where |A^'| = |A^| + \E\ = 0(|A^p) and \E'\ = 2\E\. Thus, our algorithm 
can run in 0{\E'\ log |A^'|) = 0{\E\ log |A^|). 

Proposition 6. Let G = {N, E, £) be a graph with labeled edges and nodes, = be 
its maximum labeled bisimulation, and G' the graph with labeled nodes obtained 
from G. Then, m = n if and only if m and n are in the same class at the end 
of the execution of Algorithm^on G' with the initial partition (Step 4) further 
split using node labels. 



7 Testing 

To the best of our knowledge there is no “ofhcial” set of benchmarks for testing 
an algorithm such as the one we propose in our paper. We decided to test our 
implementation in the context of formal verification using model checkers and 
considering the transition graphs they generate from a given program. In par- 
ticular, we have considered the transition systems generated by the examples in 
the SPIN package built using ideas from Q, their aim is to check that the 
implementation of a protocol verifies a formal specification. Usually, the graphs 
generated consist of a unique strongly connected component and the set of pos- 
sible labels is huge. When we rewrite them into unlabeled graphs, we usually 
obtain graphs on which we can perform the second optimization proposed in 
Section I Such an optimization allows us to delete edges in the graphs, obtain- 
ing graphs on which the algorithm runs in linear time. In FigureO^e show the 
graph obtained for the process CpO of the Snooping Cache protocol. From left 
to right are depicted: the labeled graph generated, its corresponding unlabeled 
graph, the graph after our optimization, and, finally, its bisimulation contraction 
that can be computed in linear time. 




These considerations about the “topology” of verification graphs suggested 
us some examples on which compare the performances of our algorithm with 
that of Paige and Tarjan. Details about the implementation (both in C and in 
Pascal), the machine used for the tests together with the code and the results of 
further tests are available at ittn : //www. sci .univr . it/ aovier/Bisii' The 
graphs for Test 1 (cf. FigureH we present here are transitive closures of binary 
trees. The graphs for Test 2 are obtained by linking with cycles nodes at the 
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Fig. 5. Two Tests (Time in Seconds). 



same level of the graphs of the first test. Then the “even” nodes of these cycles 
are connected by an edge to a node of an acyclic linear graph. 



8 Conclusion and Further Developments 

We proposed algorithms to determine the minimum, bisimulation equivalent, 
representation of a directed graph or, equivalently, to test bisimilarity between 
two directed graphs. The algorithms are built making use of algorithmic solution 
to the relational and single function coarsest partition problem as subroutines. 
In the acyclic case the performance of the sketched algorithm is linear while, 
in the cyclic case turns out to be linear when there are no multiple nodes. In 
general its performance is no worse than that of the best known solution for the 
relational coarsest partition problem. 

In Fisler and Vardi compare three minimization algorithms with an in- 
variance checking algorithm (which does not use minimization) and argue that 
the last is more efficient. The minimization algorithms they consider are those 
of Paige and Tarjan ^3, of Bouajjani, Fernandez and Halbwachs Q, and of Lee 
and Yannakakis An important conclusion they draw is that even if the last 
two algorithms are tailored to verification contexts, while the Paige and Tarjan 
one is not, the latter performs better. This suggests that “minimization algo- 
rithms tailored to verification settings should pay attention to choosing splitters 
carefully”. We have presented here an algorithm, which is not specifically tai- 
lored to verification, but whose main difference w.r.t. the Paige and Tarjan’s 
one is that it performs better choices of the splitters and of the initial partition 
thanks to the use of the notion of rank. In some cases we obtain linear time runs, 
moreover the initial partition we use allows to process the input without storing 
the entire structure in memory at the same time. 

Our next task will be the integration of this algorithm with the symbolic 
model-checking techniques. Further studies relative to the applicability of the 
circle of ideas presented here to the problem of determining simulations (cf. ^J) 
are also under investigation. 
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Abstract. Symmetry reduction methods exploit symmetry in a system 
in order to efficiently verify its temporal properties. Two problems may 
prevent the use of symmetry reduction in practice: (1) the property to 
be checked may distinguish symmetric states and hence not be preserved 
by the symmetry, and (2) the system may exhibit little or no symmetry. 
In this paper, we present a general framework that addresses both of 
these problems. We introduce “Guarded Annotated Quotient Structures” 
for compactly representing the state space of systems even when those 
are asymmetric. We then present algorithms for checking any temporal 
property on such representations, including non-symmetric properties. 



1 Introduction 

In the last few years there has been much interest in symmetry-based reduction 
methods for model checking concurrent systems [II ()l2l4|5psi1 2j . These methods 
exploit automorphisms, of the global state graph of the system to be verified, 
induced by permutations on process indices and variables. Existing symmetry- 
reduction methods, for verification of a correctness property given by a tem- 
poral formula (j), can be broadly classified into two categories: the first class of 
methods consider only those automorphisms that preserve the atomic 

predicates appearing in (f), construct a Quotient Structure (QS) and check the 
formula (j) on the QS using traditional model-checking algorithms; the second 
class of methods jS| consider all automorphisms, induced by process/ variable 
permutations, and construct an Annotated Quotient Structure (AQS), and un- 
wind it to verify the formula (j). 

In this paper, we generalize symmetry-based reduction in several ways. First, 
the mathematical framework, used to formalize symmetry reduction, supports 
any automorphism on the system’s state graph; for example, automorphisms 
induced by permutations on variable-value pairs can be considered in addition 
to those induced by permutations on process indices and variables. Thus, this 
framework allows for more automorphisms and hence greater reduction. 

* Sistla’s work is supported in part by the NSF grant CCR-9988884 and was partly 
done while visiting Bell Laboratories. 
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Second, we introduce the notion of Guarded Annotated Quotient Structure 
(GQS) to represent, in a very compact way, the state graph of systems with 
little or even no symmetry. In a nutshell, a GQS is an AQS whose edges are 
also associated with a guard representing the condition under which the cor- 
responding original program transition is executable. Given a program P and 
its reachability graph G, by adding edges to G (via a transformation of P), we 
obtain another graph H that has more symmetry than G, and hence can be 
represented more compactly. A GQS for G can be viewed as an AQS for H 
whose edges are labeled with guards in such a way that the original edges of G 
can be recovered from the representation of H . To verify a temporal formula (j), 
the GQS is unwound as needed, by tracking the values of the atomic predicates 
in (j) and the guards of the GQS, so that only edges in G are considered. The 
GQS of G can be much smaller than its QS because it is defined from a larger 
set of automorphisms: a GQS is derived by considering all the automorphisms 
of H, which exhibits more symmetry than G, including those automorphisms 
that do not preserve the atomic predicates in (j). We show that unwinding GQS 
on-demand, in order to verify a property (j), can be done without ever generating 
a structure larger than QS. 

Third, we present two new techniques for further optimizing the model- 
checking procedure using GQSs. These techniques minimize the amount of un- 
winding necessary to check a formula (/> and may yield an exponential improve- 
ment in performance. The first technique, called formula decomposition, consists 
of decomposing (j) into groups of top-level sub-formulas so that atomic predi- 
cates with in a group are correlated; the satisfaction of (j> can then be checked 
by checking each group of sub-formulas separately, which in turn can be done by 
successively unwinding the GQS with respect to only the predicates appearing in 
each group separately; therefore, unwinding GQS with respect to all the atomic 
predicates appearing in (j) simultaneously can be avoided. The second technique, 
called sub-formula tracking, consists of identifying a maximal set of “indepen- 
dent” sub- formulas of 4> and unwinding the GQS by tracking these sub- formulas 
only. These two complementary techniques can be applied recursively. 

The paper is organized as follows. Section 2 introduces the background infor- 
mation and notation. Section 3 introduces GQS and the model-checking method 
employing it. Section 4 presents the techniques based on formula decomposition 
and sub-formula tracking. Section 5 presents preliminary experimental results. 
Section 6 contains concluding remarks and related work. Proofs of theorems are 
omitted due to space limitations. 

2 Background 

A Kripke structure A" is a tuple {S, E,V, L) where S' is a set of elements, called 
states, EC S x S is a set of edges, V is a, set of atomic propositions and 
L : S — > 2^ is a function that associates a subset of V with each state in S. CTL* 
is a logic for specifying temporal properties of concurrent programs (e.g., see |3|). 
It includes the temporal operators U (until), X (nexttime) and the existential 
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path quantifier E. Two types of CTL* formulas are defined inductively: path 
formulas and state formulas. Every atomic proposition is a state formula as well 
as a path formula. If p and q are state formulas (resp., path formulas) then p /\q 
and ^p are also state formulas (resp., path formulas), lip and q are path formulas 
then pUq, Xp are path formulas and E(p) is a state formula. Every state formula 
is also a path formula. We use the abbreviation EF(p) for E(TrueUp) and AG(p) 
for ^(EF^p). A CTL* formula is a state formula. CTL is the fragment of CTL* 
where all path formulas are of the form p\)q or of the form Xp where p, q are state 
formulas. CTL* formulas are interpreted over Kripke structures (e.g, see P] for 
a detailed presentation of the semantics of CTL*). 

Let K = (S, R,V, L) and K' = {S',R',VjL') be two Kripke structures 
with the same set of atomic propositions. A bisimulation between K and K' 
is a binary relation U C S x S' such that, for every (s, s') G U, the following 
conditions are all satisfied: (1) L{s) = L'{s')] (2) for every t such that (s,t) G R, 
there exists t' G S' such that {t,t') G U and {s',t') G R'; and (3) for every t' 
such that {s',t') G R', there exists t G S such that (t,t') G U and (s,t) G R. We 
say that a state s G S is bisimilar to a state s' G S' , if there exists a bisimulation 
Lf between K and K' such that (s, s') G U. It is well-known that bisimilar states 
satisfy the same CTL* formulas. 

We define a predicate over a set S' as a subset of S. Let cj) he & bijection 
on S, i.e., a one-to-one mapping from S to S. Let C be a predicate over S. 
Let /(C) denote the set {f{x) : x G C}. Let denote the inverse of the 
bijection (j). If /, g are two bijections then we let fg denote their composition in 
that order; note that in this case, fg is also a bijection. Throughout the paper 
we use the following identity relating the inverse and composition operators: 

if 9)-^ =g-^r^- 

Let G = {S, E) he the reachability graph of a concurrent program where 
S denotes a set of nodes/states and E C S x S. An automorphism of G is a 
bijection on S such that, for all s,t G S, (s,t) G if iff (/(s),/(t)) G E. We say 
that an automorphism respects a predicate C over S if /(C) = C. The set of 
all automorphisms of a graph forms a group Aut{G). Given a set Pi,...,Pk of 
predicates over S, the set of automorphisms of G that respect Pi,...,Pk form a 
subgroup of Aut{G). 

Let Q he & group of automorphisms of G. We say that states s,t G S are 
equivalent, denoted by s =g t, if there exists some f G G such that t = f{s). 
As observed in is an equivalence relation. A quotient structure of G 

with respect to ^ is a graph (S, E) where S contains exactly one node in each 
equivalence class of =g and (s,t) G E iff there exists some t such that t =g t 
and (s,t) G E. Each state s G S represents all states in S that belong to its 
equivalence class. Different quotient structures can be defined by choosing dif- 
ferent representatives for each equivalence class. However, all these structures 
are isomorphic. We denote by rep{s,Q) the representative element of the equiv- 
alence class to which s belongs. In what follows, QS{G,G) denotes the quotient 
structure obtained by choosing a unique representative for each equivalence class. 
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A predicate P on the edges of G is a subset of S' x S'. We say that an edge 
(s,t) in E, satisfies P if (s,t) G P. Let True denote the set S x S. For an edge 
predicate P and automorphism (j) on states, let f{P) = {(/(s), /(f)) : (s, t) G P}. 
Given a group Q of automorphisms on G, we can extend the equivalence relation 
=g from states in S to edges in E as follows: two edges e = (s, t) and e' = (s', t') 
are equivalent (written as e =g e!) if there exists some g € G such that s' = g{s) 
and t' = g{t). It is easy to see that =g on E is an equivalence relation ^j. 

3 Model Checking Using Guarded Annotated Quotient 
Structures 

In this section, we introduce Guarded Annotated Quotient Structures {GQS) 
as extensions of Annotated Quotient Structures considered in m These struc- 
tures can be defined with respect to arbitrary automorphisms and can compactly 
represent the state space of systems that contain little symmetry. For example, 
consider a resource allocation system composed of a resource controller and three 
identical user processes, named a, b and c. When multiple user processes request 
the resource at the same time, the controller process allocates it to one of the re- 
questing users according to the following priority scheme: user a is given highest 
priority while users b and c have the same lower priority. This system exhibits 
some symmetry since users b and c are “interchangeable” . Now consider a similar 
system but where the three user processes are given equal priority. This system 
exhibits more symmetry since all three users are now “interchangeable” . Thus, 
the system without priorities has more symmetry than the system with prior- 
ities. A guarded annotated quotient structure allows us to verify systems with 
reduced symmetry (e.g., a system with priorities) by treating these as if they 
had more symmetry (e.g., a system without priorities) and without compromis- 
ing the accuracy of the verification results. For instance, in the state graph G, 
of the above resource allocation system with priorities, a state s where all three 
users have requested the resource has only one outgoing edge (granting the re- 
source to user a). By adding two other edges from s (granting the resource to 
the two other user processes), the state graph H of the system without priorities 
can be defined. Since El exhibits more symmetry than G, it can be verified more 
efficiently. Thus, by viewing G as H extended with guards so that G can be 
re-generated if needed, model checking can be done more efficiently. 

Formally, let El = {S, E) be a graph such that E E) E and AutiG) C Aut(H), 
i.e., H is obtained by adding edges to G = {S, E) such that every automorphism 
of G is also an automorphism of Let ^ be groups of automorphisms 
of H and G, respectively, such that H A Q. As indicated earlier, =-h defines 
equivalence relations on the nodes and edges of H. For any edge e G F, let 
Class{e,El) denote the set of edges in the equivalence class of e defined by =-h. 
Let Q = {Qi, ..., Qi} be a set of predicates on S such that each automorphism in 

^ Our results can easily be extended to allow the addition of nodes as well as edges. 

Note that adding edges/nodes to a graph may sometimes reduce symmetry. 
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G also respects all the predicates in Q. Let QS{G,G) = {U,E) be the quotient 
structure of G with respect to Q as defined earlier. 

A Guarded Annotated Quotient Structure of H = (S,F) with respect to H, 
denoted by GQS{H, ), is a triple {V , F, C) where R C S' is a set of states that 
contains one representative for each equivalence class of states defined by =-h on 
S, F CV xV xH is a,set of labeled edges such that, for every s €V and t € S 
such that (s, t) G F, there exists an element (s, t, f) G F such that f{t) = t, and 
C is a function that associates a predicate C(e) with each labeled edge e G F 
such that (1) C(e) fl Class{e, FL) = FD Glass{e, FL) (i.e., C(e) denotes all edges 
in Class{e,Fi) that are edges in the original graph G) and (2), for all g G G, 
g{C{e)) = C(e) (i.e., g respects the edge predicate G). 

Given a labeled edge e = {s,i, f) G F, f G Ft is called the label of e and 
denotes an automorphism that can be used to obtain the corresponding original 
edge in F; the edge predicate C(e) can in turn be used to determine whether 
this edge is also an edge of G. Labels of edges in F and the edge predicate C are 
used to unwind GQS{F{, Ft) when necessary during model checking, as described 
later. Note that edge predicates C that satisfy the above conditions always exist: 
for instance, taking C(e) = F always satisfies the definition. In practice, a 
compact representation of an edge predicate C satisfying the conditions above 
can be obtained directly from the description of the concurrent program. For 
example, in the case of the resource allocation system, the edge predicate C(e) 
is defined as follows: if the labeled edges e denotes the allocation of the resource 
to a user, then C(e) asserts that if there is a request from user a then a is 
allocated the resource; for all other labeled edges, C(e) is the predicate True. 
Similarly, the automorphisms labeling edges in F can also have succinct implicit 
representations. For example, any automorphism induced by permutations of 
n process indices as considered in pra can be represented by an array of n 
variables ranging over n. Tools like SMC m and Murphi P33 includes optimized 
algorithms for representing and manipulating such sets of permutations. 

Given a set Q of predicates over S that are all respected by the automor- 
phisms in G, we define three Kripke structures K_Stru{G, Q), QSStru{G, G, Q) 
and GQS-Stru{F[,Ft, Q) derived from G = (S', F), QS{G,G) = {U,E) and 
GQS{H,Ft) = (V,F,G), respectively. We show that these three Kripke struc- 
tures are pairwise bisimilar, and hence can all be used for GTL* model checking. 
Since C/ is a subgroup of Ft, each equivalence class of =-h is a union of smaller 
equivalences classes defined by =g. Thus, the number of equivalence classes of 
=n is smaller than those of =g, and GQS{F[,Ft) contains (possibly exponen- 
tially) fewer nodes than QS{G,G). QS{G,G) itself can be much smaller than 
G. 

For each predicate Qj (1 < j < 1) in Q, we introduce an atomic proposition 
denoted qj. Let X = {qi ■ I < i < 1}. Let K_Stru{G, Q) denote the Kripke 
structure {S,E,X,L) where for any s G S, L{s) = {qj : s G Qj}. The Kripke 
structure QSStru{G,G,Q) is given by {U,E,X,M) where M(s) = {qj : s G 
Qj}. The following theorem has been proven in laain]. 
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Theorem 1. There exists a bisimulation between the structures K _Stru{G , Q) 
and QS-Stru{G, Q, Q) such that every state s € S is bisimilar to its representa- 
tive in U. 

Therefore, any GTL* formula over atomic propositions in X is satisfied at 
a state s in K_Stru{G, Q) iff it is satisfied at its representative rep{s, Q) in 
QS^tru{G,g,Q). 

If the edge predicate G is implicitly represented by a collection of edge pred- 
icates 6>i,...,0r, the Kripke structure GQSStru{H,Tl, Q) is obtained from 
GQS{H,Tl) by partially unwinding it and by tracking the node predicates in 
Q (i.e., the predicates Qi, Qi) and the edge predicates 6>i, Or during this 
unwinding process. In other words, the unwinding is performed with respect to 
the predicates Qi,...,Qi. and Oi,...,Or, not with respect to the states of G, in 
order to limit the unwinding as much as possible. This partial unwinding can 
be viewed as a particular form of “predicate abstraction”, and is a generaliza- 
tion of the unwinding process described in Precisely, the Kripke structure 
GQS-Stru{H,Tl, Q) is the tuple {W,T,X,N) where W,T and N are defined as 
follows: 

- For all s eV, {s,Qi, Oi,...,Or) G W. 

- Let u = {s,Xi, ...,(l>r) be any node in W, e = (s, t, /) be a 

labeled edge in F and j be an integer such that Oj is the edge predicate 
G{e). Further, assume that the edge (s, /(f)) satisfies the predicate <Pj. For 
all such u and e, the node v = {t, /“^(Xi), ..., /“^(X/), ..., 

is in W and the edge (u, v) is in T. 

- For all u = (s,Xi, ...,,Xi,Fi, ...,<Fr) G W, N(u) = {q^ : s G X,}. 

The following theorem states that QSStru{G,Q, Q) and GQSStru{H,H, Q) 
are bisimilar. 

Theorem 2. Given QSStru{G, Q, Q) and GQSStru{F[, Ft, Q) as previously 
defined, let Z CU xW be a binary relation defined such that {s,u) G Z iff there 
exists an automorphism f G Ti. such that f(t) = s and u = {t, 
f~^{&r))- Then, the following properties hold: 

1. Z is a bisimulation between QSStru{G,Q, Q) and GQS-Stru{H,H, Q). 

2. For all u G W, there exists a node s GU such that (s,u) G Z. 

3. Two nodes u = {t, Xi, Xi,<I>i, ...,F>r) and u' = {t' ,Y\, Ar) 

of GQSStru{H,Ti., Q) are related to a single node s of QSStru{G,Q,Q) 
through Z iff t = t' and there exists some h in Ft such that h(t) = t and 
Xi = h(Yi) for all i = 1, ...,l, and Fj = h(Aj) for all j = 1, ...,r. 

From the previous theorem, we see that multiple nodes in GQSStru{F[, FI, Q) 
can be related through Z to a, single node in QSStru{G, Q, Q). Hence, in princi- 
ple, GQS-Stru{F[, FI, Q) can sometimes have more nodes than QSStru{G, Q, Q). 
The following construction can be used to further reduce the number of nodes 
in GQS-Stru{F[, Ft, Q) so that the reduced structure has no more nodes than 
QSStru{G, Q, Q). First, observe that all the nodes in GQS Stru{H ,Ft, Q) that 



Symmetry and Reduced Symmetry in Model Checking 



97 



are related through Z to a single node s in QSStru{G,Q, Q) can be repre- 
sented by a single node since they are all bisimilar to each other. The algo- 
rithm for generating GQSStru{H, H, Q) can be modified to apply this reduc- 
tion to construct a smaller Kripke structure GreducedStru{H,Ti, Q). Nodes in 
GQS-Stru{H,H, Q) that are related to a single node in QSStru{G,G, Q) can 
be detected by evaluating the condition stated in Part 3 of Theorem It can 
be shown that, if G is the maximal subgroup of Ti. consisting of all automor- 
phisms of G that respect Qi, Qi, then Greduced-Stru{H,H, Q) has the same 
number of nodes as QSStru{G, G, Q) and Z defines an isomorphism between 
the two structures; otherwise, Greduced-Stru{H,TL,Q) has fewer nodes than 
QS^tru{G,G,Q). 

In summary, the procedure for incrementally constructing the reachable part 
of Greduced-Stru{H, H, Q) from GQS{G, H) is the following. We maintain a set 
To-explore of nodes that have yet to be treated. Initially, To-explore contains 
nodes of the form {sq,Qi, ...,0r) where s is the representative of an 

equivalence class containing an initial state. We iterate the following procedure 
until Tojexplore is empty. We remove a node u = ...,<Pr) from 

To-explore. For each labeled edge e = in GQS{G,H), we check if the 

edge satisfies the edge predicate <d>j, where j is the index such that 

0j is the edge predicate C(e). If this condition is satisfied we do as follows. We 
construct the node v = (P, Yi, ..., Y/, Z\i, ..., Ar) where Yi = f~^{Xi) for 1 < f < Z 
and Aj = for 1 < j < r. Then, we check if there exists a node w = 

{t', Zi, ..., ..., iFr) in the partially constructed GreducedStru{H, H, Q) and 

a h gH. such that t' = h(t') and Zi = h{Yi) for all z = 1, ..., I, and = h{Aj) 
for all j = 1, ...,r (i.e., the condition of Part 3 of Theorem0is checked). If this 
condition is satisfied, we add an edge from u to w; otherwise, we add v as a new 
node, include it in To -explore and add an edge from u to v. 

Consider a CT L* formula (j) defined over a set prop{4>) of atomic propositions 
that each corresponds to a predicate in Q. Let pred{(j)) C Q denote the set of 
predicates corresponding to prop{(j)). From Theorem 2, it is easy to see that the 
formula (j> is satisfied at node s in KStru{G, Q) iff it is satisfied at the node 
u = (rep(s,7f),/"i(i?i),...,/“i(i?m), f~^{0i), f~^{0r)) in the structure 
GQS-Stru{H,Ti,TZ) where / is the automorphism such that s = f{rep{s,TL)). 
Thus, model checking the CTL* formula 4> can be done on the Kripke struc- 
tures GQS-Stru{H,Ti.,pred{(j))) or GreducedStru{H,H,pred{(p)) obtained by 
unwinding GQS{H,H) with respect to the set pred{(j)) of predicates only. Let 
us call this the direct approach. 

4 Formula Decomposition and Sub-formula Tracking 

In this section, we discuss two complementary techniques that can improve the 
direct approach of the previous section. 
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4.1 Formula Decomposition 

Any CT L* state formula (f> can be rewritten as a boolean combination of atomic 
propositions and existential sub-formulas of the form Let Eform{(j>) denote 
the set of existential sub-formulas of (j) that are not sub-formulas of any other 
existential sub-formula of (j) (i.e., they are the top-level existential sub-formulas 
of (f>). Checking whether a state s satisfies a state formula 4> can be done by 
checking whether s satisfies each sub-formula in Eform{(j)) separately, and then 
combining the results. 

For each (j)' G Eform{4>), we can determine whether s satisfies (j)' in the 
structure K.Stru{G,Q) by unwinding GQS{H,H), with respect to the predi- 
cates in pred{(j)') only, to obtain the Kripke structure GQSStru{H, H,pred{(j)')) 
and by checking if the corresponding node satisfies (j)' in this structure. Formu- 
las in Eform{(j)) that have the same set of atomic propositions can be grouped 
and their satisfaction can be checked at the same time using the same unwind- 
ing. Obviously, unwinding with respect to smaller sets of predicates can yield 
dramatic performance improvements. 

Correlations between predicates can also be used to limit the number of 
unwindings necessary for model checking. Two predicates Qi and Qj in Q are 
correlated if, for all f G H, f{Qi) = Qi iff f{Qj) = Qj- It is easy to see that the 
relation “correlated” is an equivalence relation. We say that two atomic proposi- 
tions are correlated if their corresponding predicates are correlated. Correlations 
between predicates can sometimes be detected very easily. For instance, with the 
framework of BEI where automorphisms induced by process permutations are 
considered, two predicates referring to variables of a same process are correlated: 
the predicates x[l] = 5 and 2/[l] = 10 are correlated if x[l] and y[l] refer to the 
local variables x and y of process 1, respectively. 

If two predicates Qi and Qj are correlated, the following property can be 
proven: if C is a subset of Q containing Qi and C' = GU{Qj}, then the Kripke 
structures obtained by unwinding with respect to either G or C' will be iso- 
morphic. The above property allows us to combine unwindings corresponding to 
different formulas in Eform{(j)) whose atomic propositions are correlated. First, 
we define an equivalence relation among formulas in E f orm{(j)): two formulas x 
and y in Eform{(j)) are equivalent if every atomic proposition in x is correlated to 
some atomic proposition in y, and vice versa. This equivalence relation partitions 
Eform{(j)) into disjoint groups Gi, ..., Gw Let pred{Gi) = {Upred{(j)') : 4>' S Gi}. 
Now for each group Gi, we can unwind GQS{E[,T-L) with respect to pred{Gi) 
and check whether each formula in Gi is satisfied at rep{s,TL). 

The number of unwindings can be further reduced by ordering the groups 
Gi, ..., Gw as follows. We say that Gi is above Gj if every predicate in pred{Gj) 
is correlated to some predicate in pred{Gi). The relation “above” is a partial 
order. We call Gi a top-group if there is no group above it. Observe that, if Gi is 
above Gj, we can combine their unwindings. Hence, if Eli, Ely denote the top- 
groups defined by the groups Gi,...,Gw {v < w), we can unwind GQS{H,EL) 
with respect to the predicates in pred{Hi) for each group Hi separately, and 
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check the satisfaction in state s of each formula in Hi and in all the groups Gi 
“below” it using this unwinding. 

Note that using the formula decomposition technique can sometimes be less 
efficient than the direct approach of the previous section. This can be the case 
when there is a lot of overlap between the sets pred{Hi) of predicates corre- 
sponding to the groups Hi obtained after partitioning Eform{(f>). 

4.2 Sub-formula Tracking 

A CTL* formula sometimes exhibits itself some internal symmetry. Exploiting 
formula symmetry was already proposed in Here, we generalize these ideas 
by presenting a unified unwinding process where decomposition and symmetry 
in a formula can be both exploited simultaneously. 

Let ^ be a CT L* formula. Consider two state sub- formulas 4 >' and 4 >'' of (p. We 
say that p' dominates 4 >" in p if (p'' is a sub-formula of (p' and every occurrence 
of (p" in (p is inside an occurrence of p>' . We say that p' and p” are independent 
in p if neither of them dominates the other in p. Thus, formulas that are not 
sub- formulas of each other are independent. Note that even if a formula is a 
sub- formula of another formula, it is possible for them to be independent: for 
instance, in the formula q given by E(EGgi U E((7iUg2)), the state sub-formulas qi 
and E((7iUg2) are independent since there is an occurrence of qi which does not 
appear in the context of E((7iUg2)- Let Sform{p) be the set of all sub-formulas 
of p that are state formulas. Let TZ he a, subset of Sform{p). We say that TZ 
is a maximal independent set if it is a maximal subset of Sform{p) such that 
the state formulas in TZ are all pairwise independent. There can be many such 
maximal independent subsets of Sform{p). For instance, the set of all atomic 
propositions appearing in p is obviously a maximal independent set. For the 
formula q given above, the set consisting of EGgi and E(giU(72) is a maximal 
independent set. 

In what follows, we are interested in exploiting “good” maximal independent 
sets, i.e., sets TZ whose elements are symmetric or partially symmetric. A for- 
mula q is symmetric if, for every automorphism / in Q, f{q) = q; it is partially 
symmetric when this property holds for almost all / in In general, detecting 
whether a sub-formula is symmetric is computationally hard. However, when 
syntactically symmetric constructs (similar to those in ICTL* jSj) are used, it 
is then easy to determine whether a sub-formula is symmetric. For instance, 
when only process permutations are used as automorphisms (as in pBS|), the 
sub-formula /\i^ih(i) is symmetric when / is the set of all process indices and 
h{i) is a formula that only refers to the local variables of process i; the same 
sub-formula is partially symmetric when I contains most process indices. 

Let TZ = {i?i, ..., i?m} be a (preferably good) maximal independent set of 
sub-formulas of p. We also view each element Ri of 7 ?. as a predicate, i.e., as the 
set of states that satisfy the CTL* formula Ri. Consider the Kripke structure 
GQS-Stru{H,'H,TZ) obtained by unwinding GQS{H,H) with respect to TZ. In 
a similar way, we can define GreducedJStru{H,'H,TZ) following the procedure of 
Section 3 . 
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Let ip denote the formula obtained from (p by replacing every occurrence of 
the sub-formula Ri by a fresh atomic proposition for all i = The 

following theorem relates the satisfaction of (p and ip. 

Theorem 3. Let s be a state in S and f be an automorphism in 7i such that 
s = f{rep{s,7i)). Then, the formula <p is satisfied at state s in the structure 
K_Stru{G, Q) iff Ip is satisfied at the node u = (rep(s, Ti), 

f~^{0r)) in the structure GQSStru{H,Tt,Tl) iff ip is satisfied at 
node u in the structure GreducedStru{H,TL,TZ). 

Thus, the previous theorem makes it possible to check a formula (p “hierarchi- 
cally”, by recursively checking sub- formulas Ri and then combining the results 
via the unwinding of GQS{H,'H) with respect to TZ only. 

We now discuss the construction of the structures GQSStru{H, Tl, TZ) and 
GreducedStru{H ,Ti,TZ) . The states of both of these structures are of the form 
{s,Xi, where each Xi is a GTL* state formula obtained by 

applying some automorphism to Ri during the unwinding process. Remember 
that, during the construction process, we need to be able to check whether a 
newly generated node v = (t,Yi, ■■■, ^r) is the same as some previ- 
ously generated node u = (s, ATi, Xm,'Ti, i-e., whether s = i, Yi = Xi 

for all i = and Aj = <Pj for all j = l,...,r. Checking whether s = t 

and Aj = <Lj for all j = l,...,r can usually be done efficiently as previously 
discussed. However, checking whether Yi = Xi can be hard since each of these 
can now be any GTL* state formulas, and checking equivalence of such formu- 
las is computationally hard in general. Note that, if the GTL* formula <p uses 
syntactically symmetric constructs such as those in LGTL* 0, then this check 
can always be done efficiently. 

Another important aspect in the construction of GQSStru{H, Ti, TZ) is the 
generation of N{s) for each state s. For a node u = {s, Xi, Xm,d^i, ■■■,d>r), 
ri G N{u) iS s G Xi. Since Xi can now be any GTL* state formula, this means 
that s € W iff s satisfies the formula Xi in the Kripke structure K_Stru{G, Q). 
Since Xi is obtained by applying a sequence of automorphisms in Tl to the state 
sub-formula Ri of (p, we know that Xi = f{Ri) for some f GTL. This automor- 
phism / can be made available at the time of generation of u by maintaining 
automorphisms with states in the set To-explore used in the algorithm for gener- 
ating Greduced-Stru{H , TL, Q) given in Section 3. Thus, checking whether s G Xi 
reduces to checking whether s satisfies the sub-formula f(Ri) in K_Stru{G, Q), 
which itself holds iff f~^{s) satisfies Ri in K_Stru{G,Q). The latter can be 
checked by recursively applying the above procedure to Ri instead of (p. 

We thus obtain a complete recursive procedure which constructs different 
structures corresponding to the different sub-formulas Ri of <p. Note that the 
formula decomposition technique of Section 4.1 can be used to decompose sub- 
formulas Ri. Thus, formula decomposition and sub- formula tracking are comple- 
mentary and can be both applied recursively. It is to be noted that if no good 
maximal independent set TZ can be found then the procedure of Subsection 4.1 
should be applied directly. 
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Example 

We illustrate the method by a brief example. Assume that we are using auto- 
morphisms induced by process permutations, as in m- Consider a concurrent 
system of n processes. Consider the problem of model-checking with respect to 
the formula (j) given by E(giU A^g/ E/i(i)) where h{i) is a path formula with no 
further path quantifiers and it only refers to the local propositions of process i, I 
is the set of all process indices excepting process 1, is the local proposition of 
process 1. Let denote the sub-formula Aig/E/i(i). This is a partially symmet- 
ric sub- formula. We take TZ to be the set {gi, (/)'}, since it is a “good” maximal 
independent set. 

We construct GQSStru{H,H,TZ). Let M be the total number of nodes 
in GQS{H,Ti). M can be exponentially smaller than the number of nodes in 
the full reachability graph, i.e., the number of nodes in K_Stru{G, Q). It is not 
difficult to show that the number of nodes in GQSStru{H, H, TV) is at most nM. 
During the construction of GQS-Stru{H,H,TV), we need to determine which of 
its nodes satisfy the sub- formula (j)' . To determine this, we invoke the procedure of 
subsection 4.1 only once. During this procedure, for each i G /, we determine the 
nodes that satisfy the sub- formula E/i(i). This is done by unwinding GQS{H, H). 
The resulting structure is also of size at most nM. Thus the over all complexity 
of this procedure is 0{n^M). 

However, if we use the direct approach and unwind GQS{H,'H) (or if we 
use QS-Stru{G,G,pred{(j)))) then we will get the full reachability graph. Thus 
we see that the above example is a case for which the method of this section 
is exponentially better than the direct approach; (an example program is the 
resource controller with n identical user processes) . On other hand, one can give 
examples where the direct method is better than the method of this section. As 
observed, this occurs for cases when the formula has no symmetric (or partially 
symmetric) sub-formulas. It is to be noted that the formula (f>, given above, is 
not an IGTL* formula and hence the methods of m can’t be applied. 

5 Experimental Results 

In this section, we report some preliminary experimental results evaluating the 
techniques proposed in this paper. Experiments were performed in conjunction 
with the SMC tool A first example is the simple resource allocation system 
described at the beginning of Section 3. We considered a variant of the system 
with priorities where user 1 is given higher priority than all other users. We 
checked the following property for various values of A is it possible to reach a 
global state where one of the first i users is holding the resource and the resource 
is still available? 

We used two approaches to check the above property. Both approaches give 
correct answer. The first approach employs the structure QSStru{G, Q, Q); here 
G is the set of automorphisms induced by process permutations that fix each 
of the first i processes and arbitrarily permute the other user processes. The 
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Table 1. Comparison of the Two Approaches. 



Value of i 


First Approach 
i.e., employing QSStru 


Second approach, i.e., using 
formula decomposition 


2 


14/2676 


14/1863 


3 


19/3260 


16/1864 


4 


39/4270 


18/1865 


5 


130/6505 


20/1866 


6 


575/11404 


22/1867 



second approach uses formula decomposition of Section 4.1. The decomposed 
sub- formulas are checked by unwinding GQS{H,'H) with respect to the atomic 
predicates of the sub- formulas independently; here Ti. is the of automorphisms, 
induced by process permuations, that arbitrarily permute all the user processes. 
Formula decomposition was performed manually and SMC was used to check 
the sub-cases. 

Tabled compares the run-time and memory usage of the two approaches, for 
the resource allocation system described above with a total number of 80 user 
processes. Each entry in the table has the form x/y where x is the run-time in 
seconds and y is the memory usage in Kbytes. Clearly, the second approach, 
i.e. the approach with formula decomposition, performs better than the first 
approach; the difference in their performances becomes more pronounced for 
larger values of i. 

We also performed experiments using the Fire-wire protocol (with admin- 
istrator module) considered in [ 1 4j . using a configuration with three stations. 
We checked whether it is possible for either stations 1 or 2 not to receive an 
acknowledgment after a message is sent. Again, we compared the above two ap- 
proaches. The first approach took 80 seconds and used 24 Mbytes of memory 
to complete the verification, while the second approach (i.e. the direct approach 
with formula decomposition) took 58 seconds and used 12.8 Mbytes of memory. 

6 Conclusion and Related Work 

We have presented new algorithmic techniques for exploiting symmetry in model 
checking. We have generalized symmetry reduction to a larger class of automor- 
phisms, so that systems with little or no symmetry can be verified more efficiently 
using symmetry reduction. We also presented novel techniques based on formula 
decomposition and sub-formula tracking. Preliminary experimental results are 
encouraging. Full implementation, and further evaluation with respect to real 
world examples, needs to be carried out as part of future work. 

As mentioned earlier, symmetry reduction in model checking has been ex- 
tensively studied in !iOI2l4lbl^ll2liM. The problem of verifying properties of 
systems with little or no symmetry was first considered in m- The work pre- 
sented in |C] considered also considers general automorphisms. There, only the 
verification of symmetric properties was discussed. In contrast, our algorithms 
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can be used to verify any property specified in CTL*, even if the property 
is not symmetric. 0 presents a verification method for ICTL* formulas. Our 
sub-formula tracking technique can also be used to efficiently verify properties 
specified in ICTL*, in addition to being applicable to any CTL* formula. For- 
mula symmetry was explicitly considered in 0 where quotient structures are 
constructed with respect to automorphisms representing symmetries of the pro- 
gram as well as of the formula. Our sub-formula tracking technique indirectly 
uses formula symmetry dynamically as the GQS is unwound. 
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Abstract. In this paper we present the application of generalized retiming for 
temporal property checking. Retiming is a structural transformation that relo- 
cates registers in a circuit-based design representation without changing its ac- 
tual input-output behavior. We discuss the application of retiming to minimize 
the number of registers with the goal of increasing the capacity of symbolic state 
traversal. In particular, we demonstrate that the classical definition of retiming 
can be generalized for verification by relaxing the notion of design equivalence 
and physical implementability. This includes (1) omitting the need for equivalent 
reset states by using an initialization stump, (2) supporting negative registers, 
handled by a general functional relation to future time frames, and (3) elimi- 
nating peripheral registers by converting them into simple temporal offsets. The 
presented results demonstrate that the application of retiming in verification can 
significantly increase the capacity of symbolic state traversal. Our experiments 
also demonstrate that the repeated use of retiming interleaved with other struc- 
tural simplifications can yield reductions beyond those possible with single appli- 
cations of the individual approaches. This result suggests that a tool architecture 
based on re-entrant transformation engines can potentially decompose and solve 
verification problems that otherwise would be infeasible. 



1 Introduction 

The main bottleneck of temporal property checking is the potentially exorbitant compu- 
tational resources necessary for state traversal. In general, there is no clear dependency 
between the structure or size of the analyzed circuit and the resource requirements to 
perform reachability analysis. However, a smaller number of state bits, i.e., registers, 
generally correlates with a lower memory and runtime consumption for performing 
state traversal. In particular, for BDD-based techniques 111 1211 fewer registers result in 
fewer BDD variables which typically decreases the size of the BDDs representing the 
set of states and transitions among them. Similarly, in SAT-based state enumeration Q, 
the complexity of the state recording device directly depends on the number of registers. 
A second motivation for our work comes from the observation that a reduced number 
of registers often decreases the functional correlation between them. Intuitively, this 
produces a less scattered state encoding which results in a more compact BDD or cube 
structure for BDD or SAT-based reachability analysis, respectively. 

In this paper we discuss the application of retiming to reduce the number of reg- 
isters with the goal of improving symbolic reachability analysis. Retiming is com- 
monly referred to as a structural transformation of a circuit-based design description 
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that changes the positions of the state holding elements without modifying the input- 
output behavior 0. Traditionally, the use of retiming is focused on design synthesis 
with two constraints that fundamentally limit the solution space: the circuit must be 
physically implementable and it must preserve its original input-output behavior. In 
property verification these restrictions can be lifted, which results in significantly more 
freedom for register minimization. There are three extensions of classical retiming for 
a generalized application in verification. First, a temporally partitioned state traversal 
eliminates the restriction on the retimed circuit of having an equivalent reset state. Sec- 
ond, a generalized symbolic state traversal algorithm can handle “negative registers.” 
This significantly increases the solution space for legal retimings by removing the non- 
negative register count constraints from the problem formulation. Third, state bits which 
are exclusively driven by primary inputs or drive only primary outputs represent a mere 
temporal shift of peripheral values, and can be suppressed for state space traversal. 

In this paper we describe the application of retiming for verification using these 
three generalizations. This work provides a specific approach in a more general scheme 
for property checking which uses a set of targeted circuit transformations. In an engine- 
based architecture, a retiming engine is applied as one step in a series of transformations 
which gradually simplify the verification problem until it can be solved by a terminal 
engine (e.g., BDD- or SAT-based). Note that such a modular, transformation-based ap- 
proach was key in making automatic logic synthesis practical 0. 



2 Illustrating Example 

Figure Qt shows a circuit example with six registers Ri, . . . , Rq, two inputs a and b, 
and one output p. Using a notation introduced in Sectional the initial states of the six 
registers are assumed to be / = {I 21 , = (1, 0, 0, 1, 0, 1). The 

subscript and superscript denote the circuit arc and the register position along this arc, 
respectively. Further, let p = 1 be a predicate to be checked for all reachable states. 

Retiming moves registers forward and backward across gates with the goal of min- 
imizing their count. The corresponding optimization problem can be formulated as an 
Integer Linear Program (ILP) using a directed graph model of the circuit 0| . The graph 
vertices and arcs represent gates and interconnection (i.e., wires), respectively. A spe- 
cial host vertex is introduced which is connected to all inputs and outputs. Figure 03 
shows the retiming graph for the given example. The arc labels denote the number of 
registers at the corresponding nets. The ILP determines a lag for each vertex which 
represents the number of registers moved backward through it m. 

The original definition of retiming for synthesis requires preserving input-output 
behavior. With this restriction, the circuit of FigureQt cannot be retimed since registers 
i?i and i ?2 have incompatible initial states and cannot be merged by a backward move. 
To show this, if both registers were shared with a joint initial state of 1, the sequence 
(a, 5) = ((0,0), (1,0), (0,0)) would produce p = (1, 1, 1) andp = (1, 1,0) in the orig- 
inal and retimed circuit, respectively. Similarly, for a joint initial state of 0 the sequence 
(a, b) = ((1, 0), (0, 0), (1, 0), (0, 0)) would distinguish the behavior of the circuits. 

In verification, we need not to preserve input-output equivalence of the retimed cir- 
cuit as long as we can preserve the truth of the given properties. The requirement for 
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Fig. 1: Retiming Example: (a) Original Circuit, (b) Retiming Graph. 



equivalent reset states can be relaxed by unrolling the circuit for multiple cycles until 
the set of retimed initial states is uniquely determined. This corresponds to a temporal 
decomposition of the verification task into two parts: (1) checking a bounded acyclic 
initialization structure, further referred to as the retiming stump, and (2) checking the 
retimed circuit, further referred to as the retimed recurrence structure. The first part in- 
volves a SAT check to prove the correctness of the properties for the time frames that are 
included in the retiming stump. The second part involves model checking the retimed 
circuit, which effectively provides an inductive correctness proof for all remaining time 
frames. The initialization state of the retimed circuit can be computed by symbolically 
simulating the retiming stump up to the retimed recurrence structure. 

Registers at the inputs and outputs are mere temporal signal offsets and do not 
impact the state reachability of the circuit core Thus, they can be ignored during 
reachability analysis. For failing properties, the offsets are restored by temporal shifts 
in the counter-example trace. Adopting the terminology from Malik et al. [Q we will 
refer to this method as peripheral retiming. For peripheral retiming the host vertex is 
removed from the retiming graph, causing the IFF to pull as many registers as possible 
out of the circuit. Figure Eh shows the graph for a maximal peripheral retiming of the 
example ignoring initial state equivalence. The arc labels represent the register counts 
of the original and retimed circuit. The vertex labels denote their lag, i.e., the number 




Fig. 2: Graphs for relaxed retimings for the example of Fig. ^ (a) peripheral retiming 
ignoring reset state equivalence, (b) retiming with negative registers permitted. 
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Fig. 3: Retiming result of Fig. (a) retimed circuit, (b) intuitive interpretation of neg- 
ative registers, (c) interpretation of the unrolled circuit structure (dark: retiming stump, 
medium shaded: retiming recurrence structure, lightly shaded: retiming top). 



of registers that have been pushed backward through them. As shown, by merging i?i 
and i ?2 and removing i?g, the register count could be reduced from six to four. 

A third relaxation of retiming is achieved by enabling negative register counts at 
the arcs. This approach is motivated by the fact that registers merely denote functional 
relations between different time frames. In logic synthesis, clocked or unclocked delay 
elements are used to physically implement these relations. Such delays can only realize 
backward constraints, each consisting of a combinational expression in the present and a 
variable in a future time frame. In symbolic verification, this limitation can be lifted and 
arbitrary relations can be handled. This includes forward constraints between variables 
in the current time frame and expressions in future time frames, represented by negative 
registers. In contrast to the common case of symbolic forward traversal, constraints 
imposed by negative registers delay the decision about the actual reachability of a state 
until all referred future time frames are processed. This results in a third component for 
the above described temporal verification decomposition, reflected by the retiming top. 

To enable negative registers, the non-negativity constraints on the arc labels are re- 
moved from the ILR FigureEb shows the resulting retiming graph for the example. By 
using one negative register, the total register count is reduced to three. FigureEb shows 
the resulting circuit. Note that these three registers reflect the actual temporal relations 
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present in the loops and reconverging paths of the original circuit. Figure Qr gives an 
intuitive interpretation of negative registers in a circuit context. In symbolic reachability 
analysis, negative registers can simply be handled by exchanging the current and next 
state variables in the transition relation. Figure Et illustrates the retiming process using 
the unrolled circuit structure. The medium shaded area reflects the retimed recurrence 
structure which is passed to symbolic model checking. The dark area denotes the retim- 
ing stump which is used to compute the initial state for the retimed circuit and to verify 
p for the first two time frames. The lightly shaded area represents the retiming top. 

The actual verification process consists of several steps. First, we need to prove 
that the property holds for the retiming stump using a SAT check. In the given exam- 
ple, it easy to show that p® = 1 for i — 0, 1, 2. Further, the set of initial states / for 
the retimed recurrence structure is computed by symbolically executing the stump, re- 
sulting in / = I ^a°3b°3v.{ll2 = a° A / 3 I 4 = v A = 1)} = 

{(0, 0, 1), (0, 1, 1), (1, 0, 1), (1, 1, 1)}. Next, starting from these initial states, symbolic 
traversal is performed on the retimed structure. This leads to a counter example for the 
initial state (/ 42 J -^341 -^ 54 ) = (Oj 1) with the inputs = 0 and = 0. Further, the 
retiming top imposes a constraint on the negative register 134 = 02 V 62 which can be 
satisfied for the given failing state. A complete counter-example trace is composed by 
a satisfying assignment of the retiming stump for generating the required reset state of 
the retimed structure, a counter-example trace generated by the retimed structure, and a 
satisfying assignment for the constraint imposed by the negative registers. For the given 
example, this results in (a, b) = ((0, 0), (0, 0), (0, 1)). 



3 Previous Work 



The application of structural circuit transformations in sequential verification is a rela- 
tively new research area. Hasteer et al. fS) proposed the concepts of retiming and state 
space folding for sequential equivalence checking. Their state-folding technique works 
for circuits in which the number of latches contained in loops and reconverging paths 
is constant modulo n. In this case n succeeding state transitions can be concatenated 
for symbolic state traversal. Baumgartner et al. |t9l extend the state-folding concept to 
handle arbitrary registers and general CTL property checking. The idea of state space 
folding is orthogonal to the retiming approach described in this paper, and the combi- 
nation of both techniques is a promising subject of our future research. 

For logic optimization, Leiserson and Saxe IIIUI describe the application of struc- 
tural retiming and propose an ILP 0| formulation using a graph model. Malik et al. 0] 
were the first to introduce peripheral retiming with the objective of moving a maximum 
number of registers to the circuit boundaries. This makes the combinational circuit core 
as large as possible for providing maximum freedom for conventional combinational 
optimizations. They also introduced the concept of negative registers as a method of 
temporarily “borrowing” registers from inputs and outputs. After finishing the combi- 
national optimization, these registers are “legalized” by retiming them back to positive 
registers. In contrast, our paper describes the direct application of negative registers for 
verification and gives formal algorithms to fully handle them. 
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The problem of generating valid initial states for the retimed circuit has been ad- 
dressed in several publications. Touati and Brayton HD proposed a method for adding 
reset circuitry which forces an equivalent initial state. Even et al. HD described a mod- 
ified retiming algorithm that favors forward retiming, allowing a simple computation 
of the initial states. All previous work on reset state computation assumes input-output 
equivalence. In this paper we propose a method of eliminating that limitation for verifi- 
cation and describe how a more generalized reset state can be obtained. 

Gupta et al. (i were first to propose the application of maximal peripheral retiming 
in the context of simulation-based verification. They showed that peripheral registers 
can be omitted during test generation without compromising the coverage of the re- 
sulting transition tour. Still, their approach is focused on test generation and does not 
consider full reachability. Further, the paper does not address the initialization prob- 
lem and does not use the concept of negative registers. The work of Cabodi et al. | |1 3| . 
which uses retiming to enhance symbolic reachability analysis, is the closest to ours. 
However, they use an original synthesis retiming algorithm with the above-mentioned 
limitations regarding enforced reset state equivalence and non-negative registers. Fur- 
ther, the applied retiming grid is based on next-state functions which significantly re- 
duces the optimization freedom. Consequently, the reported results show mostly modest 
improvements over existing techniques. 

4 Generalized Retiming for Verification 

Let C = (G, E) denote a circuit where G represents a set of combinational gates, pri- 
mary inputs, and primary outputs, and i? C G x G is a set of arcs connecting the gates. 
Each arc {u, v) G E is associated with a non-negative weight w{u, v) representing the 
number of registers at this arc. Clearly, for all hardware designs we can assume that the 
initial register count of all arcs is non-negative; i.e., w{u, v) > 0. Further, without loss 
of generality, we assume that the circuit does not contain combinational loops. 

Let < i < w{u,v) denote the initial value of register i along arc {u,v) 

and guifju, ■ ■ ■ , fku) be the function of gate u using the functions fju, ■ ■ ■ , fku of arcs 
(j, u), . . . ,{k,u) at its inputs. If u represents a primary input, denotes the sampled 
input value at a given time. The state of G at time f > 0 is computed recursively as: 



This definition of / can be used to express the function of any internal net of the design 
modeled by G. For example, the value at time t of the net connecting the output of 



A retiming of G is defined as a gate labeling r : G ^ Z, where r{u) is the lag of 
gate u denoting the number of registers that are moved backward through it. The new 
arc weights w of the retimed circuit G are computed as follows: 




( 1 ) 



register i with the input of register i + 1 of arc {u, v) is 






w{u, v) = w{u, v) + r{v) — r{u). 



( 2 ) 
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In this context we are interested in minimizing the total number of registers of C : 

|tr;(M, u)| — > min . (3) 

y{u,v)^E 



Note that due to the missing host vertex, the formulation aims at maximal peripheral 
retiming which removes registers from the primary inputs and outputs. The given mod- 
eling does not take into account that the registers of the outgoing arcs from a gate can 
be shared and must be counted only once in the objective function. A correct ILP mod- 
eling of “register sharing” can be achieved by a slightly modihed problem formulation 
for which the details are presented in [4]. In contrast to retiming for synthesis, we do not 
impose a non-negative constraint on w. Therefore, the new circuit may have negative 
arc weights, representing negative registers. 

Equation 0 imposes an equivalence relation on the set of retimings. Two retimings 
ri and result in identical circuits and are said to be equivalent if and only if ri = 
T 2 -I- c, where c denotes an integer constant. We define a normalized retiming r' as; 

r' = r — max r(u). (4) 

Vu 



In the following we will use the term retiming to denote normalized retimings. Similar 
to formula for a given retiming r the state of C at time t can be computed as: 

\ft<w{u,v), 

Juv S ^t—w(u,v) ,, 

otherwise, 

fu = ( 5 ) 

where the represent the initial states of C. In contrast to formula ([[}, it is not obvious 
that this formula is well formed, because the w{u, v) can assume negative values. 



Theorem 1. Let C be a circuit containing a finite number of gates, arcs, and non- 
negative registers without combinational loops, and r be a retiming resulting in circuit 
C. The evaluation of formula 0/or computing the state of C at time t will terminate 
for any finite t > 0. 



Proof. First, it is obvious that t remains non-negative during the evaluation of 0. Sec- 
ond, since C and therefore C contain a finite number of gates, any non-terminating eval- 
uation of formula 0 must involve an infinite recursion on at least one gate. Let u be one 



(ui,U2) 

of those gates and p = u — > u\ — > 



{Un ,u) 



u be the circular path in C corre- 
sponding to the recursion. The difference between t and t' of two suceeding recursions 
is then f — f' = w{u, ui)-\-w{ui,U 2 ) -h . . .-\-w{un, u). A substitution using 0 leads to 
t — t' = w{u, ui) -\-w{ui,U 2 ) -f . . .-\-w{un, u) . All registers are positive (w{ui,Uj) > 
0), and there are no combinational loops (3(zii, Uj) G p with w{ui, Uj) > 0). There- 
fore t strictly decreases after each recursion which causes the evaluation to terminate 
once t < w{ui, Vj) for some arc {ui, Uj) G p. □ 



The retiming stump of a retiming r is a partial unrolling of C and is defined as: 



s = {sL \sL = fL ^ (u,v) G E A 0 <t < w{u, v) - r(u)}. 



(6) 
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The new verification structure is composed of S and C, where S provides the arc func- 
tions for the first cycles and the initial states for the positive registers of C as follows: 



0<f <zh(zt,t;). (7) 

Note that this formula is well formed for normalized retimings because r{v) < 0. 

Theorem 2. Let C be a circuit containing a finite number of gates, arcs, and non- 
negative registers without combinational loops and r be a retiming resulting in circuit C 
and the retiming stump S. The following relations provide a bijective mapping between 
each arc function of {C, S} to the corresponding arc function ofC and vise versa: 



Suv if t<w{u,v)-r{v), 
otherwise, 



( 8 ) 



s* = f 

^UV J 'i 

fiv = f, 



t 

uv 



if t < w{u, v) — r{v), 



£t—r{v) 

uv 



(9) 



Proof. First we show that function 0 correcly maps {C, S'} to C: For t < w{u, v) — 
r{v), ® reflects the definition of s given in 0. For t > Wuv — after substitution 

1^t-\-r(v)—w(u,v) nt-\-r(v)—w(u,v) \ i • i • 

^ ^ ^ which IS 



J JU 

't-\-r{v) — w{u,v) 



= /, 



t—w{u,v) 



't-\-r{v)—w{u,v) 



using 0, we must show that /*„ = guift^ 
done by inductively proving for the arguments of pu that f, 

Base case (t -h r{v) — w{u, v) < w{i, u)): Using 0 and Q we get /} 

t r{v)-vw{u,v) _ jt+r(i;) w(u,v) r{u) applying shows the re- 

quired equality. Inductive step if + r{v) — w{u,v) > w{i,u)): A substitution using 

Q results in _ g_(^jt+r{v)-w(u,v)-w(i,u) jt+r(v)-w(u,v)-w{i,u)^ 

If ti;(f , m) > 0 we can immediately reduce the arguments of pi by induction which re- 
suits in Pi (/^. ff,. ) = hu ' and show equivalence. 

If w{i, u) < 0, then the right hand side needs to be further expanded until an inductive 
reduction can be performed. A termination analysis similar to the proof of theorem [0 
can be applied showing the superscript value of / will eventually decrease and there- 
fore the expansion will terminate after a finite number of steps. Next, showing that © 
correctly maps {C, S'} to C is straight forward by using the definition for s for the first 
part and an inductive proof identical to the one used in the first theorem for the second 
part. □ 

Corollary 1. Let C be derived from C by retiming and c be a Boolean constant, then 

VT(/1 = c) Vf.[(0 < t < w{u,v)-r{v)) ^ (4^ = c)] A Vf'.(/4 = c). (10) 



In other words, generalized retiming provides a circuit transformation that is sound 
and complete for verifying properties of the form AGfp), where the primary circuit 
inputs are non-deterministic and p is a predicate on any net of the circuit. Its application 
for more complex safety properties requires that the property formula be expressed as 
a circuit which is composed with the actual design before retiming can be applied. 
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Similarly, in order to handle constrained circuit inputs, the verification environment 
must be composed with the circuit before retiming can be applied. 

Corollary 2. Let C be a circuit derived from C by retiming and S be the corresponding 
retiming stump. Further, let AG{p) be a property that fails for C for an initial state I 
resulting in a counter-example trace T. The counter example T for the original circuit 
C can be obtained by applying formula 0 on T and S. 

In essence, formula (|H|) provides the mechanism for trace lifting that back-translates any 
counter example from the retimed circuit to the original circuit. 

5 Transformation-Based Verification 

We implemented the retiming transformation as a re-entrant reduction engine with a 
“push” interface similar to a BDD package. The engine consumes a circuit from a 
higher-level engine, performs retiming, and then passes the resulting circuit down to a 
lower-level engine. For debugging of failing properties, the engine implements a back- 
translation mechanism that passes counter-example traces from the lower-level engine 
back to the higher-level. This setting allows an iterative usage of retiming and other 
reduction algorithms until the circuit can be passed to a “terminal” decision engine. 

As an internal data structure we use a two-input And/Inverter graph similar 
to the one presented in Dl except that registers are modeled as edge attributes. This 
representation allows the application of several on-the-fly reduction algorithms, includ- 
ing inverter “dragging” and forward retiming of latches, both enabling a generalized 
identification of functionally identical structures by hashing. As an ILP solver we uti- 
lized the primal network simplex algorithm from IBM’s Optimization Solutions Library 
(OSL) O to solve the register minimization problem. 

As a second simplification engine, we implemented an algorithm for combinational 
redundancy removal which was adopted from an equivalence checking application 111 41 . 
This engine uses BDD sweeping and a SAT procedure to identify and eliminate func- 
tionally equivalent circuit structures, including the removal of redundant registers. As 
a terminal reachability engine we adapted VIS O version 1.4 (beta) for our experi- 
ments. In addition to the partitioned transition relation algorithm, VIS 1.4 incorporates 
a robust hybrid image computation approach. 

6 Experimental Results 

We performed a number of experiments to evaluate the impact of retiming on symbolic 
reachability analysis, using 31 sequential circuits from the ISCAS89 benchmarks and 
27 circuits randomly selected from IBM’s Gigahertz Processor (GP) design. All exper- 
iments were done on an IBM RS/6000 Model 260, with a 256 MBytes memory limit. 

In the first set of experiments we assessed the potential of generalized retiming 
for reducing register count. In particular, we evaluated an iterative scheme where the 
retiming engine (RET) and the combinational reduction engine (COM) are called in an 
interleaved manner. The results for the ISCAS and GP circuits are given in Tabled 
For the ISCAS benchmarks, we list only the circuits with more than 16 registers since 
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Table 1: Retiming results for ISCAS circuits (upper part) and GP circuits (lower part). 



Design 


Number of Registers (negative) 


Relative 

Reduction 

(Best) 


Max. 

Lag 


Time (s) / 
Memory (MB) 

(Best) 


Results of 

S/O 

(Registers) 


Original 


COM 

Only 


RET 

Only 


COM-RET 
1 Iteration 


COM-RET 
2 Iterations 


COM-RET 
3 Iterations 


PROLOG 


136 


81 


45(1) 


45(1) 


45(3) 


44(2) 


67.6% 


2 


1.4/22.4 


-/- 


S1196 


18 


16 


16 


14 


14 


14 


22.2% 


1 


0.6/10.7 


16/- 


S1238 


18 


17 


16 


15 


14 


14 


22.2% 


1 


0.9/21.1 


17/- 


SI 269 


37 


37 


36 


36 


36 


36 


2.7% 


1 


0.4/ 6.2 


-/- 


S13207.1 


638 


513 


390 


343 


292 (1) 


289 


54.7% 


11 


3.8/34.7 


-/- 


SI423 


74 


74 


72 


72 


72 


72 


2.7% 


1 


0.5/ 6.2 


72/74 


SI512 


57 


57 


57 


57 


57 


57 


0.0% 


1 


0.5/ 6.2 


-157 


S15850_l 


534 


518 


498 


488 


485 


485 


9.2% 


6 


5.3/31.8 


-/- 


S327I 


116 


116 


110 


110 


110 


110 


5.2% 


5 


0.7/ 7.0 


-/ 116 


S3330 


132 


81 


44(2) 


44(3) 


44(2) 


44(2) 


66.7% 


3 


0.7/ 7.0 


-/- 


S3384 


183 


183 


72 


72 


72 


72 


60.7% 


6 


0.7/7. 1 


-/ 147 


S35932 


1728 


1728 


1728 


1728 


1728 


1728 


0.0% 


1 


7.2/38.0 


-/- 


S382 


21 


21 


15 


15 


15 


15 


28.6% 


1 


0.3/ 5.9 


15/- 


S38584_l 


1426 


1415 


1375 


1375 


1374 


1374 


3.6% 


5 


29.4/ 127.4 


-/- 


S400 


21 


21 


15 


15 


15 


15 


28.6% 


0 


0.3/ 5.9 


15/- 


S444 


21 


21 


15 


15 


15 


15 


28.6% 


1 


0.3/ 5.9 


15/- 


S4863 


104 


88 


37 


37 


37 


37 


64.4% 


4 


0.9/ 7.3 


-/96 


S499 


22 


22 


22 


22 


20 


20 


9.1% 


1 


0.6/15.1 


-/- 


S526N 


21 


21 


21 


21 


21 


21 


0.0% 


2 


0.4/ 5.9 


-/- 


S5378 


179 


164 


112(6) 


112(6) 


111(6) 


111(6) 


38.0% 


5 


1.6/18.4 


-/ 144 


S635 


32 


32 


32 


32 


32 


32 


0.0% 


1 


0.4/ 5.9 


-/- 


S641 


19 


17 


15 


15 


15 


15 


21.1% 


2 


0.4/ 5.9 


18/- 


S6669 


239 


231 


92 


75 


75 


75 


68.6% 


5 


1.6/14.1 


-/- 


S713 


19 


17 


15 


15 


15 


15 


21.1% 


2 


0.4/ 5.9 


-/- 


S838_l 


32 


32 


32 


32 


32 


32 


0.0% 


0 


0.5/6.1 


-/- 


S9234_l 


211 


193 


172 


172 


165 


131 


37.9% 


3 


2.5/26.2 


-/- 


S938 


32 


32 


32 


32 


32 


32 


0.0% 


0 


0.4/ 6.1 


-/- 


S953 


29 


29 


6 


6 


6 


6 


79.3% 


0 


0.4/ 6.1 


-/- 


S967 


29 


29 


6 


6 


6 


6 


79.3% 


0 


0.4/ 6.1 


-/- 


S991 


19 


19 


19 


19 


19 


19 


0.0% 


2 


0.4/ 6.0 


-/- 


C_RAS 


431 


431 


378 


370 


348 


348 


19.3% 


3 


6.0/22.6 


-/- 


D_DASA 


115 


115 


100 


100 


100 


100 


13.0% 


2 


0.9/7. 1 


-/- 


D_DCLA 


1137 


1137 


771 


750 


750 


750 


34.0% 


1 


35.4/36.2 


-/- 


D_DUDD 


129 


129 


100 


100 


100 


100 


22.5% 


3 


0.9/ 7.0 


-/- 


IJBBC 


195 


195 


40 


40 


38 


36 


81.5% 


2 


1.6/21.6 


-/- 


IJFAR 


413 


413 


142 


139 


136 


136 


67.1% 


4 


3.1 / 19.5 


-/- 


1_IFEC 


182 


182 


45 


45 


45 


45 


75.3% 


6 


0.7/ 7.0 


-/- 


LIFPF 


1546 


1356 


673 (4) 


661 (4) 


449 (2) 


442 (2) 


71.4% 


10 


46.5 / 127.9 


-/- 


L_EMQ 


220 


220 


87 


88 


74 


74 


66.4% 


4 


3.4/18.5 


-/- 


L_EXEC 


535 


535 


163 


137 


135 


134 


75.0% 


6 


9.8/28.1 


-/- 


L_FLUSH 


159 


159 


1 


1 


1 


1 


99.4% 


3 


0.8/ 7.0 


-/- 


L_LMQ 


1876 


1831 


1190 


1185 


433 (3) 


425 (3) 


77.3% 


3 


50.7/ 139.1 


-/- 


L_LRU 


237 


237 


94 


94 


94 


94 


60.3% 


2 


1.1 /7.1 


-/- 


L_PNTR 


541 


541 


245 


245 


245 


245 


54.7% 


3 


1.8/8. 8 


-/- 


L.TBWK 


307 


307 


124 


124 


40 


40 


87.0% 


3 


2.7/18.0 


-/- 


M.CIU 


111 


686 


415 


415 


411 


387 (1) 


50.2% 


15 


26.3/76.6 


-/- 


S_SCUI 


373 


373 


204 


200 


192 


192 


48.5% 


3 


9.0/20.6 


-/- 


S_SCU2 


1368 


1368 


566 


565 


426 


423 


69.1% 


5 


102.2/67.4 


-/- 


V.CACH 


173 


155 


104 (2) 


96 (3) 


96 (2) 


95(1) 


45.1% 


9 


1.1/24.0 


-/- 


V_DIR 


178 


151 


87 


83 


43 


42(1) 


76.4% 


5 


0.9/22.3 


-/- 


V_L2FB 


75 


75 


26 


26 


26 


26 


65.3% 


2 


0.5/ 5.9 


-/- 


V_SCR1 


150 


128 


52 


48(1) 


48(1) 


48 


68.0% 


4 


0.7/10.9 


-/- 


V_SCR2 


551 


551 


86 


82 


82 


82 


85.1% 


4 


4.4/15.0 


-/- 


V_SNPC 


93 


93 


21 


21 


21 


21 


77.4% 


4 


0.5/ 6.8 


-/- 


V_SNPM 


1421 


1216 


233 (7) 


233 (7) 


231 (11) 


227 (8) 


84.0% 


15 


14.7/65.2 


-/- 


W.GAR 


242 


232 


91(1) 


90 


90 


79(1) 


67.4% 


2 


3.2/25.4 


-/- 


W_SFA 


64 


64 


42 


42 


41 


41 


35.9% 


1 


1.0/16.0 


-/- 
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smaller designs are of less interest for these experiments. Columns 2, 3, and 4 report 
the number of registers of the original circuit, after applying COM only, and RET only, 
respectively. The following columns give the register counts after performing various 
numbers of iterations of COM followed by RET. The number of negative registers, if 
non-zero, is given in parentheses. For brevity, we report only up to three iterations; 
more iterations provided only marginal improvements. The reported maximum lag in 
column 9 gives an indication of the size of the retiming stump. 

Overall, the results indicate that generalized retiming has a significant potential for 
reducing the number of registers for verification. For the ISCAS benchmarks we ob- 
tained a maximum register reduction of 79% with an average of 27%. For the processor 
circuits we achieved an average reduction of 62%. 

The number of negative registers generated by retiming is surprisingly small. This 
can be explained by the two-input And/Inverter data structure used as circuit rep- 
resentation. One can show that within each strongly connected component (SCC) of 
such circuits, there exists an optimal retiming with only positive registers. Only paths 
between the SCCs may require negative registers for an optimal solution. 

Table|3gives the performance results for symbolic reachability analysis. We report 
results for all circuits of Table[I]for which retiming resulted in a register reduction and 
reachability analysis could be completed. We ran each experiment with two options for 
the VIS image computation: the IWFS95 partitioned transition relation method and the 
hybrid approach. The best of the two results on a per-example basis are then reported. 
Although after reduction we can complete traversal for only three additional circuits, the 
results clearly show that retiming significantly improves the overall performance. The 
CPU time is decreased by an average of 53.1% for ISCAS and 64.0% for GP circuits, 
respectively. The corresponding memory reductions are 17.2% and 12.3%, respectively. 
The cumulative run time speedup is 55.7% for the ISCAS benchmarks and 83.5% for 
the GP circuits. To illustrate the complexity of the retiming stump, we report the BDD 
sizes for the initial states in column 7. As shown, these BDDs remain fairly small and 
do not impact the complexity of the reachability analysis. 

Figure Elshows the profile of the BDD size while traversing benchmark S3330 for 
the original circuit and after applying various reduction steps. This example demon- 
strates how retiming typically benefits the performance of the traversal. To further il- 
lustrate the effect of retiming on the correlation of the state encoding, we analyzed the 
traversal of circuit S4863. Reachability timed out during the third traversal step of the 
original circuit. Using retiming, the correlation between the remaining registers was 
completely removed resulting in full reachability of all 2^^ states. While such a pro- 
found result is likely atypical, this is strong evidence of the power of both structural 
simplification and retiming to reduce register correlation. 



7 Conclusions and Future Work 

We presented the application of generalized retiming for enhancing symbolic reacha- 
bility analysis. We discussed three extensions of the classical retiming approach which 
include: (1) eliminating the need for equivalent reset states by introducing the concept 
of an initialization stump, (2) supporting negative registers, handled as general func- 
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Table 2: Effect of retiming on reachability analysis (C = completed within the time limit 
of four hours, H = hybrid image computation, I = IWLS95 image computation). 



Design 


1 Original Circuit 


1 Reduced Circuit 


Relative 
Improvement 
Time / Memory 


Number of 
Registers 


Reachability 
Steps, Algo 


Time (sec) / 
Memory(MB) 


Number of 
Registers 


Reachability 
Steps, Algo 


BDDinit 

Nodes 


Time (sec) / 
Memory(MB) 




136 


I7C7 


2285 / 134.5 


45 


16C 77 


611 


81.6/27.5 


96.4%/ 79.6% 


S1196 


18 


4CJ 


1. 1/6.5 


14 


2C 7 


122 


0.5/ 6.3 


54.5%/ 3.1% 


S1238 


18 


4CJ 


1.2/6.5 


14 


2C 7 




0.1/ 6.3 


91.7%/ 3.1% 


SI 269 


37 


WCH 


13194/ 185.5 


36 


11 C 77 


901 


13395/ 187.5 


-1.5%/-!. 1% 


S3330 


132 


\1CH 


668.0/35.3 


45 


16C 7 


194 


35.8/15.6 


94.6%/ 55.8% 


S382 


21 


13C7 


< 0.1 /6.2 


15 


11C7 


17 


< 0. 1/6.1 


0.0%/ 1.6% 


S400 


21 


IOC 7 


< 0.1 /6.2 


15 


IOC 77 


16 


< 0. 1/6.1 


0.0%/ 1.6% 


S444 


21 


4C 7 


< 0. 1/6.1 


15 


3C77 


27 


< 0. 1/6.1 


0.0% / 0.0% 


S4863 


104 


3 7 


14400/ 174.2 


37 


4C 7 




14.8/16.6 


99.9%/ 90.5% 


S499 


22 


1 C77 


0.2/ 6.2 


20 


1 C 77 


21 


< 0. 1/6.2 


100%/ 0.0% 


S641 


19 


6CI 




15 


5CI 


15 


1.0/ 6.4 


-25.0%/ 0.0% 


S713 


19 


6C 7 


0.9/ 6.3 


15 


SCI 


15 


0.6/ 6.4 


33.3% / -1. 6% 


S953 


29 


6C 7 


0.8/ 6.4 


6 


5 C 77 


7 


< 0. 1/6.1 


100%/ 4.7% 


S967 


29 


4C 7 


1.1 /6.3 


6 


3C77 


7 


< 0. 1/6.1 


100%/ 3.2% 


C_RAS 


431 


1028 C 7 


724.3/57.2 


370 


1026 C 7 


415 


424.0/51.8 


41.5%/ 9.4% 
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6C 7 


19.7/7.8 


100 


5C7 


200 


33.0/11.6 


-67.5% / -48.7% 




129 


13C7 


953.3/ 112.8 


100 


11 C 77 


2568 


359.1 /33.7 


62.3%/ 70.1% 


lilSlSllH 


195 


5 C 77 


145.3/11.4 


40 


3C77 


41 


4.4/ 6.4 


97.0%/ 43.9% 


UFAR 


413 


5 7 


14400/87.0 


139 


12 Cl 


719 


2302/ 102.0 




UFEC 


182 


6C7 


66.3 / 8.4 


45 


2C 77 


151 


28.0/6.9 






220 


8C77 


323.7/ 17.0 


88 


5CH 




205.6/33.0 


36.5%/ -94.1% 


LiXEC 


535 


5 77 


14400/63.2 


137 


9CI 


1856 


593.6/ 103.2 


95.9% / -63.3% 


L_FLUSH 


159 


4C 7 


37.4/7.7 


1 


2C77 


2 


< 0.1 /6.2 


100%/ 19.5% 


L_PNTR 


541 


6C 7 


6687/ 138.5 


245 


3C7 


242 


2423 /51.2 


63.8%/ 63.0% 




307 


6C 77 


184.1/9.1 


124 


4C 77 


123 


74.0/7.4 


59.8%/ 18.7% 


S.SCU1 


373 


14C 77 


8934/ 165.8 


200 


12C 77 


755 


1195/118.1 




VXACH 


173 


11 C 77 


92.1 / 17.2 


97 


8C7 




20.0/8.9 




VJ9IR 


178 


8C77 


57.9/8.3 


83 


1C I 


95 


11.1/7.0 




VX2FB 


75 


4C7 


2.9/ 6.3 


26 


2C 77 


27 


< 0. 1/6.1 


100%/ 3.2% 


V_SCR1 


150 


20C 77 


250.0/ 17.7 


48 


17C7 


90 


5.0/15.5 


98.0%/ 12.4% 


V_SCR2 


551 


22 C 7 


1201 / 105.0 


82 


20 C 7 


220 


260.0/36.7 


78.4%/ 65.0% 


V_SNPC 


93 


4C 77 


4.9/ 6.6 


21 


1 C 77 


17 


< 0.1 /6.2 


100%/ 6.1% 


W.GAR 


242 


11 C 7 


109.8/25.0 


90 


9C77 


191 


82.5/13.0 


24.9%/ 48.0% 


W_SFA 


64 


7C7 


3.7 /6.8 


42 


6C 7 


14 


3.6/ 6.9 


2.7%/ -1.5% 



tional relations to future time frames, and (3) removing peripheral registers by convert- 
ing them into simple temporal offsets. We implemented the presented algorithm in a 
transformation-engine-based tool architecture that allows an efficient iteration between 
multiple reduction engines before the model is passed to a terminal reachability algo- 
rithm. Our experiments based on standard benchmarks and industrial circuits indicate 
that the presented approach significantly increases the capacity of standard reachability 
algorithms. In particular, we demonstrated that the repeated interleaved application of 
retiming and other restructuring algorithms in a transformation-based setting can yield 
reduction results that cannot be achieved with a monolithic approach. 

In this paper the application of retiming is focused on minimizing the total number 
of registers as an approximate method for enhancing reachability analysis. It does not 
take into account that the actual register placement can have a significant impact on 
other algorithms used for improving symbolic state traversal. An interesting problem 
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Symbolic Reachability Profile for S3330 
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Fig. 4: BDD size profile for traversing S3330 with method IWLS95 after various trans- 
formations. 



for future research is to extend the formulation of structural transformations beyond 
simple retiming to obtain a more global approach for improving reachability analysis. 
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Abstract. We propose a BDD based representation for Boolean func- 
tions, which extends conjunctive/disjunctive decompositions. The model 
introduced (Meta-BDD) can be considered as a symbolic representation 
of fc— Layer automata describing Boolean functions. A layer is the set of 
BDD nodes labeled by a given variable, and its characteristic function 
is represented using BDDs. Meta-BDDs are implemented upon a stan- 
dard BDD library and they support layered (decomposed) processing of 
Boolean operations used in formal verification problems. Besides target- 
ing reduced BDD size, the theoretical advantage of this form over other 
decompositions is being closed under complementation, which makes 
Meta-BDDs applicable to a broader range of problems. 



1 Introduction 

Binary Decision Diagrams Q (BDDs) are a core technique for several applica- 
tions in the field of Formal Verification and Synthesis. They provide compact 
implicit forms for functions depending on tens to hundreds of Boolean variables. 

Many variants of the original BDD type (have been proposed to explore 
possible optimizations and extensions (see for example a survey in Q) . Dynamic 
variable ordering techniques (sifting Q) have played a key role to push forward 
the applicability of BDDs and to face the ordering dependent memory explosion 
problem. Partitioned Q and decomposed Q forms have also been followed as 
a divide-and-conquer attempt to scale down the complexity of symbolic opera- 
tions. 

This paper follows the latter trend. We propose a decomposed representation 
for Boolean functions, which extends conjunctive/disjunctive decompositions. 
One of the limitations of conjunctive (disjunctive) decompositions is that they 
are biased to the zeroes (ones) of a Boolean function. Let us consides for instance 
a conjunctive form / = Ai /o each one of the fi components describes a subset 
of the zeroes (the OFF-set) of /. Dually for disjunctive forms. Both forms are 
not closed under negation: the negation of a conjunctive form is disjunctive 
(and vice-versa), so the application requires both forms, and practical/heuristic 
simplification rules, unless all formulas can be put in positive normal form. 

^ Reduced Ordered BDDs (ROBDDs), or simply BDDs whenever no ambiguity arises 

G. Berry, H. Comon, and A. Finkel (Eds.): CAV 2001, LNCS 2102, pp. 118-^^^ 2001. 
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Our work proposes a decomposed form evenly oriented to represent both the 
zeroes and the ones of a Boolean function. Besides looking at a compact for- 
mat, we look for efficient symbolic manipulation in the decomposed form. Our 
solution can be canonical, it is closed under negation, and it supports standard 
Boolean operations and quantifiers, so it may be applied to BDD based combi- 
national and sequential verification problems. To find the most suitable way of 
describing this new decomposed form, we adopt an automaton model, which has 
recently been proposed to describe Boolean functions within an explicit reach- 
ability framework Q. We see a BDD (and the related Boolean function) as an 
automaton, and we describe it through a set of BDDs. We thus use the term 
Meta-BDD for the decomposed form. 

In the sequel, we will briefly overview some preliminary concepts and related 
works, then we will introduce Meta-BDDs and the related symbolic manipula- 
tions. We will finally present some experimental results attained with a prototype 
implementation. 



2 Preliminaries and Related Works 

Binary Decision Diagrams (BDDs) □ are directed acyclic graphs providing 
a canonical representation of Boolean functions. Starting from a non reduced 
Ordered BDD (OBDD), the Reduced OBDD (ROBDD Q) for a given Boolean 
function is obtained by repeatedly applying two well known reduction rules: (1) 
Merging rule (two isomorphic subgraphs are merged), and (2) Deletion rule (a 
BDD node whose two branches point to the same successor is deleted). 

Simple graph algorithms, working depth-first on BDDs, implement many 
operators: apply, ite (if-then-else), and existential/universal quantifiers are 
well-known examples. BDDs have been widely used in verification problems to 
represent functions as well as sets, by means of their characteristic functions. 
Operations on sets are efficiently implemented by Boolean operations on their 
characteristic functions. The notation \A is usually adopted for the character- 
istic function of a set A. For instance, let A, B be two sets, and \A, Xb their 
characteristic functions, we write: 

XAUB = Xa'^ XB, XAnB = Xa/\Xb, Xa-b = Xa a ~^xb 

For sake of simplicity, we make a little abuse of notation in the rest of this 
paper, and we make no distinction between the BDD representing a set, the 
characteristic function of the set and the set itself. 

2.1 State Sets Represented by fc— Layer DFAs 

A given Boolean function f{x) : {0, 1}^ ^ {0, 1} is represented by Holzmann and 
Puri 5 as a Deterministic Finite Automaton (DFA). They introduce k— Layer 
DFAs to describe sets of states within a verification framework based on explicit 
reachability. An automaton accepts input strings of length k. In the Boolean 
case, {0, 1} is the input alphabet, the automaton accepts a set 5 C {0, 1}^ of 
fc— tuples, and each layer in the automaton corresponds to an input bit of the 
function describing a state set. 

A fc— Layer DFA has one initial and two terminal states, the accepting ter- 
minal state 1, and the rejecting terminal state 0. The automaton is minimized 



120 Gianpiero Cabodi 




(a) 



(b) 



Fig. 1. A BDD (a) and a k— Layer DFA (b) for the same Boolean function. The 
deletion rule is not applied to the k— Layer DFA. 



if states which have exactly the same successors are merged together. The au- 
tomaton describing a Boolean function has a close relationship with the BDD 
representing it, with a variable ordering corresponding to the input string order- 
ing. A fc— Layer DFA is minimized by only using the merging rule, whereas the 
deletion rule is avoided, to keep input strings of length k. 

As an example, FigureOshows the BDD (a) and the DFA (b) for the same 
Boolean function. Given a BDD variable ordering corresponding to the layers, 
the two representations have similar shapes, but no implicit variables are present 
in the fc— Layer DFA. 



2.2 McMillan’s Conjunctive Decomposition 

McMillan’s canonical conjunctive decomposition Q is another relevant starting 
point for this work. The automaton representation is proposed by McMillan, 
too. He sees a BDD representing a set of states as a “finite state automaton that 
reads the values of the state variables in some fixed order, and finally accepts or 
rejects the given valuation”. 

A function f{x\, ...,Xn) is decomposed as / = Ar=i /*> conjunctive 

component being defined as fi = | The functions are the projec- 

tions of / onto growing sets of variables f^^\xi, ... , Xj) = 3(a;i+i, ...,Xn).f{x) and 
I is the generalized cofactor “constrain” operator Conjunctive components 
have growing support {fi = fi{xi, ..., xf)), and the representation is canonical 
given a variable order for projections (which is not necessarily the same as the 
BDD variable order). Cofactoring is a major source of BDD size reduction for 
this representation, due to the BDD simplification properties of the generalized 
cofactor operator: the BDD representing g I f is often (not always^!) smaller 

^ Since “constrain” may introduce new variables (and BDD nodes) in the cofactored 
term, simplification is not always achieved. Other variants of generalized cofactor 
have been introduced to expecially address simplification tasks. An example is the 
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than the BDD of g, and the decomposition A J, ex- 

ploits this fact, expecially in cases of factors with disjoint supports or supports 
including conditionally independent variables. 

Conjunction, disjunction and projection (existential quantification) 
algorithms are also proposed in fl, in order to use the decomposition in sym- 
bolic model checking problems that can be put in positive normal form (with 
negation only allowed on literals). They can be summarized as follows. 



Conjunction, is the simplest operation. The result of a conjunction fg is com- 
puted in two steps. An intermediate result t is first evaluated by conjoining the 
couples of corresponding components fi and gi bottom-up (with decreasing i), 
and applying a “reduction” | process: 

tn — fngn 

ti—1 — fi—igi—l^^i'ti 

The intermediate result is then “normalized” top-down (by increasing i) for 
canonicity (and BDD simplification): hi = ti I hi I ... I hi-\. The decom- 
posed conjunction thus results in a linear number of conjunction and projection 
operations, and a quadratic number of cofactor operations. 



Disjunction, is a less natural operation for conjunctive decompositions. The 
z-th component of /i = / V g should be evaluated by taking into account all 
components of the operands from 1 to z: 

u = K]=igj 

hi = ti[hi I ... I /zi_i 

This is not efficient, because of the explicit computation of conjunctions 
and delayed normalization. So a more efficient computation, with interleaved 
normalization, is proposed: 

i i 

= f\Uj i hi i ... i h,-i)y f\{g, i hi i ... i h,.i) 

t=i i=i 

resulting in a quadratic number of conjunction and cofactor, and a linear number 
of disjunction operations (which is more complex than the previous conjunction 
case) . 



Projection (Existential Quantification h = 3S.f.) has the same problems as 
disjunction, and it is computed in a similar way: hi = 3S'.(/\*^j^(/j I hi I ... I 
hi-i)) Again a quadratic number of conjunction and cofactor operations (and a 
linear number of quantifications) is required. 

“restrict” cofactor which locally abstracts from / variables not found in g. But 
some nice properties of “constrain” are lost, and canonicity of conjunctive decom- 
position is not guaranteed. 

® Reduction is a term we bring from breadth-first BD D m anipulation (indicating a 
postponed application of merging and deletion rules) It was not used in | 
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2.3 Incompletely Specified Boolean Fnnctions 

An incompletely specified Boolean function is a function defined over a subset 
of {0, 1}". The domain points where the function is not defined are called don’t 
care set, whereas the points where the function is defined as true or false are 
called ON-set and OFF-set, respectively (the union of ON-set and OFF-set is 
called care set). 

Given an incompletely specified function /, we will use the notation f.on 
for the ON-set, f-off for the OFF-set, and f.dc for the don’t care set. Two 
of them are enough to completely characterize /, since they have null mutual 
intersections and their union is the domain space. So / might be represented 
by the couple / = {f.on, f-off), being f.dc = -•{f.on V f.off). Another way to 
represent / is the interval of completely specified functions / = [f.on, ^f.off]. 



3 Meta-BDDs. Describing a BDD by a Layered Set of 
BDDs 

This section defines Meta-BDDs. They are not a new type of Decision Diagram 
for Boolean functions. We introduce them as a layered set of BDDs used to 
describe a Boolean function. We view a BDD as a DFA, and we use other BDDs 
to describe it by layers of variables, and to symbolically encode breadth-first 
computations of BDD operators. 

We can also view Meta-BDDs as an extension of McMillan’s canonical con- 
junctive decomposition Q. Our representation is more general, since it includes 
conjunctive as well as disjunctive decompositions, and it is closed under Boolean 
negation. It is canonical under proper conditions. 

Let us define the i— th layer as the set of nodes labeled by the Xi variable. 
We characterize the layer with the BDD paths reaching terminals (either 1 or 0) 
from Xi nodes. In the automaton view of BDDs, this means that the accepting 
or rejecting final state is decided when testing the Xi variable. In the case of 
Figure Q there is no path to terminal nodes from the Xi layer, there is one 
path to terminal 1 from the X 2 layer {x±X 2 = 11), 3 paths to 0 at the X 3 layer 
{X 1 X 2 X 3 = {000, 010, 100}), 3 paths to 1 (a;ia; 2 a; 3 a ;4 = {0011, 0111, 1011}) and 3 
paths to 0 {X 1 X 2 X 3 X 4 = {0010, 0110, 1010}) from the X 4 layer. 

We describe a layer of a given function / by means of a function capturing 
the zeroes (paths to 0) and ones (paths to 1) of / at that layer. More specifically, 
we encode the i-th layer of / with an incompletely specified function fi, such 
that the ON-set {fi.on) is the set of ones of / at the i-th layer, and the OFF-set 
{fi-off) is the set of zeroes of / at the i-th layer. As a consequence, the don’t 
care set {fi.dc) is the set of ones/zeroes reached by / at other layers. 

We informally introduce the Meta representation of / (Meta-BDD if symbol- 
ically encoded by BDDs), as the set of fi layers that completely characterize /. 
For each layer we represent the two sets of paths leading to the 1 and 0 terminals 
at that layer. In the case represented in Figure^ this leads to the Meta form of 
FigureHa). 

A more accurate and formal definition of Meta-BDDs is obtained by intro- 
ducing the Meta operator <>-^, working on incompletely specified Boolean 
functions. 
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h = ( 0 , 0 ) 

/2 = (a;ia;2, 0) 

h = (0, -'(a;iX2)“'X3) 

fi = {-^{x\X2)XzXa,^{xiX2)X3^Xa) 



h = ( 0 , 0 ) 

/2 = (a;i2;2, 0) 
fs = (0,-ia;3) 
/a = {xa, ^Xa) 



(a) (b) 

Fig. 2. Layered Meta representations of f. Each fi = {fi-on, fi.off) is exactly 
defined only for the ones and zeroes of f at the corresponding layer (a). Upper 
layers are used to simplify lower fis by means of cofactoring (b). 



Definition 1. Given two incompletely specified Boolean functions f and g, h = 

< f^9 incompletely specified function, such that the ON-set ( OFF- 

set) of h is the ON-set (OFF-set) of f augmented with the portion of the ON-set 
(OFF-set) of g not covered by the care set of f: 

< f,g>^=^ lTE{^f.dcJ,g) (1) 

The above definition can be rewritten as 

( h.on = f.onV g.on A^f.off 
< f:9 = h such that < h.off = f.off V g.off A ~^f.on 

y h.dc = f.dc A g.dc 



The operator returns the argument in the unary case (< / = /) and it is 

associative 

< < /,5 >-^,h>^ =< f,< g,h>-^>^ = < f,g,h>-^ 

We are now ready to formally define the Meta decomposition of / in n com- 
ponents. 

Definition 2. The Meta decomposition of a Boolean function f is an ordered 
set of components that produces f if given as argument to the <>-^ operator: 
fj\4 fiif ^ f f f f 

J[l,n] — < Jl, J2, •••, Jn > — / 

A Meta-BDD is a BDD representation of a Meta decomposition. 

We adopt [i, j] subscripts to indicate intervals of components, and we option- 
ally omit them if clear from the context (we use f-^ instead of /j^| ) • 



Applying equation 
sively written as 



I and associativity, a Meta decomposition can be recur- 

which leads to the following expanded expression for /: 

/ = ITE(-/i.dc, /i, ITE(-/ 2 .dc, / 2 , ITE(...ITE(-/„_i.dc, U-i, fn)))) 

The inspiring idea of this decomposition is that each component contributes 
a new piece to the ON-set and the OFF-set of /. In other words, / is progres- 
sively approximated and finally reached by a sequence of incompletely specified 
functions /j^j A ^ A /j^j = / ordered by the precedence relation 



fM. -< fM 



(f/y!,.on C .on) A (fH .off C f/^ '' 
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The functions have non decreasing ON-set and OFF-set, and the last one coin- 
cides with /. 

We have a degree of freedom in selecting the sequence of functions converging 
to /. Starting from canonical conjunctive decomposition, we adopt the idea of 
projecting / onto growing sets of variables, so that /j^j captures the ones and 
zeroes of / at the upper i layers (i.e. the BDD paths reaching 0 or 1 forall 
variables at the lower levels (x > a;^)). We thus choose f^j(xi, Xi) such that 

= V(x > Xi).f 
= y{x > Xi).^f 

The /j^j function is represented by the first i terms of the Meta decom- 
position The definition does not provide a rule to uniquely compute 

the fi components, given /. In fact, we have here another degree of freedom, 
as the fi term is partially “covered” by the previous ones (< t). So we might 
leave it partially unspecified, or better exploit this fact (as in Q) and simplify 
the lower layers by cofactoring them with the don’t care set of the upper ones: 

fi = /[M J' 

Since the don’t care set of is the intersection of the don’t care sets 

of the first i — 1 components {f^_iydc = Aj=i fj-dc, )we avoid computing it 
and we apply to fi a chain of cofactors: fi = f^ji fi-dc J, f 2 -dc | ... | fi-i.dc. 
This would simplify the representation of Figure Ha) to the form (b), which is 
obviously simpler. 



Meta-BDDs and Conjunctive Decompositions. Meta decompositions in- 
clude McMillan’s conjunctive decomposition as the particular case with 
f^yon = 0 for all i < n and f\^yoff = W{x > Xi).^f = ^3(a; > Xi).f. 

Given the above assumptions, the z-th component of f-^ is fi = (0, -'3(a; > 
Xi).f i 3(a; > Xi-i).f), where the OFF-set is the complement of McMillan’s 
generic conjunctive term. 



Variable Ordering and Grouping. The ordering applied to the definition of a 
Meta-BDD is not required to be the same as the BDD ordering. Moreover, layers 
can be extended to groups of variables, i.e. each Xi in the previous definitions is 
a set of variables instead of a single variable. This has the advantage of reducing 
the number of layers in the decomposition, where each layer includes a set of 
variables (and the corresponding edges to terminals). 

In our implementation, we observed best performance when using the same 
order for variables and layers, and grouping variables. For the cases we addressed 
(100 to 200 state variables), reasonable group sizes are 10 to 30 variables. Dy- 
namic variable ordering is supported, provided that variable layers are rebuild 
each time a new variable order is produced. The overhead we experienced for 
this transformation is low compared to sifting time, and to the overall cost of 
image computations, since rebuilding variable groups is a linear operation and 
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transforming a Meta-BDD from old layers to new ones is a variant of the BDD 
to Meta-BDD transformation (described in the next section). 



Meta-BDDs and Canonicity. Meta-BDDs are canonical under conditions 
similar to McMillan’s decomposition, i.e. that layer simplifications are done 
through constrain cofactor. Canonicity guarantees constant time equality check, 
but our experience in sequential reachability shows that we may give it up when- 
ever non canonical representations produce memory reduction. 

This is often the case for Meta-BDDs, where a conditional application of 
reduction and constrain simplification may filter out the decompositions 
producing benefits and abort the bad ones. This is a major point for the efficiency 
in our implementation where reductions and cofactorings are controlled by BDD 
size based heuristic decisions. 

We also experienced the “restrict” cofactor Q, with worse results on the av- 
erage, compared with controlled application of constrain. A possible reason for 
this fact is that restrict guarantees good local optimizations of individual func- 
tions, but operations involving restricted functions may blow up when combining 
terms with different restrict optimizations. 



4 Symbolic Operations on Meta-BDDs 

We describe in this section how basic Boolean operators can be applied to Meta- 
BDD decompositions. In particular, we will concentrate on the operations re- 
quired by sequential verification tasks: standard logic operators and quantifiers. 
Our procedures are here proposed for the canonical case, and we omit for sim- 
plicity heuristic decision points for conditional application of reductions and 
cofactor simplifications. 

First of all, Meta BDDs provide a constant time Not operation whose result 
is again a Meta-BDD. In fact, since Boolean negation swaps zeroes with ones, 
is computed by simply swapping the (/i.on, fi-off) couples. 

Going to BDD/Meta-BDD conversions and Appl y-lik e operations, we operate 
them through a layered process, which is inspired by Q, where BDD operations 
are performed through a breadth-first two phase (Apply-Reduce) technique. But 
our method is implicit, we operate breadth-first through layer-by-layer itera- 
tions on the T-ROBDD implicit structure, represented by BDDs. 

FigureHshows how we convert a BDD to a Meta-BDD. We initially assign all 
terminal edges to the bottom layer, i.e. we initialize all Meta-BDD components to 
0, except the last one. Then we perform reduction and constrain simplification. 

Reduction and constrain simplification are shown in Figure^ MetaReduce 
is a bottom-up process which finds BDD nodes with both cofactors pointing to 
the same terminal. The merged edges are moved to the upper layers. MetaCon- 
STRAIN (FigureH^b)) operates the cofactor based simplification. The reduction 
and constrain operations are here represented by dedicated procedures, as post- 
processing steps of Meta-BDD operations. For best performance, they can be 
integrated within breadth-first manipulations, and operated by layers as soon 
as possible. 
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BDD2META (f) 

for i = 1 to n — 1 
fi^{0,0) 

(/,-/) 

MetaReduce(/-^ ) 

MetaConstrain(/'^ ) 
return 

Fig. 3. Converting a BDD to Meta-BDD. All terminal edges are initially as- 
signed to the bottom (n—th) layer. They are moved to the proper upper layers by 
the reduction procedure, constrain simplification is finally operated 



MetaReduce (/•^) 




MetaConstrain (/•^) 


for z = n — 1 downto 1 




for z = 1 to n — 1 


fi.on <— fi.on V V(® > Xi 


)./i+i.on 


for j = z -|- 1 to n 


fi.off ^ fi.off yy{x> 
fi+i ^ fi+i J. fi.dc 


Xi).fi+1.0ff 


fj ^ fi 1 fi-dc 


(a) 




(b) 



Fig. 4. Reduction (a). Terminal edges are moved upward by a bottom-up itera- 
tive process. Move is achieved by adding the reduced part to fi, then deleting it 
from fi^iusingco factor . Constrain based simplification (b). A double iteration 
is operated to avoid explicit computation of f{^ .dc. 



As an example of Apply operation, we show the conjunction (MetaAnd) 
procedure. Disjunction is obtained for free in terms of complementation and 
conjunction. The proposed algorithm is based on the following theorem 

Theorem 1. Let /j^j and be Meta decompositions, then 



ft 



■M ^ 



[jA ^ 3[j,i] < D> 






M 



with Vi computed as: 



Vi.on = VLi fi-on A VLi 9i-on 
ri-off = fi.off A ^ Vj=i 9i-on V gi.off A ^ V/=i fi-on 



Proof Sketch. Let us consider conjunction as a symbolic breadth-first visit of 
the product automaton of / and g. The set of paths reaching 0 at the z— th layer 
is given by the paths where either f or g are 0 at the z— th layer, and they are 
not 1 at the upper ones (< z)| . The set of paths to 1 is given by the paths 
where both automata (/ and g) reach 1 at one of the first z layers. 

Figure^a) shows our algorithm for conjunction, f and g' are used to col- 
lect the overall ON-sets of the upper components (V; fi-on and V/ 9i-on in The- 
orem Q . Explicit computation of the above terms would be in contrast with 

^ Due to the cofactor based simplification, fi.off (gi.off) could intersect the onset of 
upper components 
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MetaAnd (/•^, g^) 
f ^ 1, ^ 1, dc ^ 1 

for i = 1 to n 

dc 

g' ^ g' I dc 

n-off ^ fi-off i dc A -./'v 
gi-off j dcA-yg' 
f' ^ f ^ fi-on J, dc 
g' ^ g' A gi.on J, dc 
ri.on ^ f' y g' 
dc <— ^{ri.on V ri.off) 

^ (n, 

METAREDUCE(r-^ ) 
METACONSTRAIN(r-^ ) 
return r^ 

(a) 



MetaExist (f-^, S) 

/' <— 1, dc <— 1 
^ jM 

for i = 1 to n 

r^r I dc 

ri ^ ri i dc 

ri.on ^ 3S. /\{f, ri.on) 
f ^ fi-off) 

n.off ^ ^3S.f 
dc <— -iri.on V ri.off 
METAREDUCE(r'^ ) 
METACONSTRAIN(r'^ ) 
return r~^ 



(b) 



Fig. 5. Breadth-first computation of Boolean And (a) and existential quantifica- 
tion (b). The layers of the result are computed through top-down layered visits 
of the operands 

the purposes of the decomposed representation. We thus interleave the layer 
computations with cofactoring based simplifications, which allow us iteratively 
projecting /' and g' on decreasing subsets of the domain space. Cofactoring is 
done both to keep BDD sizes under control, and to achieve a preliminary reduc- 
tion. Full bottom-up reduction and final constrain simplification are explicitly 
called as last steps. 

Existential quantification (MetaExist procedure) is shown in Figure ^b)- 
Computation is again top-down, and based on the following theorem 

Theorem 2. Let /j^j be a Meta decomposition, then 

=< 3S.h,3S.{^h.off A 

Proof Sketch. Let us again concentrate on the layered automaton view of 
3S.f^y The existential quantification|of the first component {3S.fi) captures 
all ones and zeroes reached at the f-th layer of the non reduced result (other 
ones/zeroes might be hoisted up when reducing lower levels). The ones and 
zeroes at lower layers (> i) are computed working with f^i (lower layers of 
the operand). But spurious ones introduced by cofactor transformations could 
produce wrong (overestimated) ones, so we need to force 0 within the OFF-set 
of the i-th component. 

The algorithm of Figure flb) uses /' to accumulate the filtering function 
(conjunction of complemented OFF-sets). We do not represent /' as a mono- 

® We compute the existential quantification of an incompletely specified function as 
3s.f = {3S.f.on,\/S.-^f.off) 
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lithic BDD, since this would again be a violation of our primary goal (decom- 
position). We thus use “clustered” BDDs (partitioned conjunctions performed 
under threshold control), and we also interleave layer computations with cofactor 
simplifications (as in MetaAnd). 

Existential quantification is by far the most important (and expensive) oper- 
ation in symbolic reachability analysis. Due to its combined usage with conjunc- 
tion within image/preimage computations, BDD packages provide the so called 
“relational product” or “and-exist” operator, a recursive procedure specifically 
concieved to avoid the explicit intermediate BDD generated as a result of con- 
junction before existential quantification. We did the same with Meta-BDDs, and 
we implemented a MetaAndExist procedure (not shown here) which properly 
integrates the previously shown MetaAnd and MetaExist algorithms. 



5 Experimental Results 

The presented technique has been implemented and tested within a home-made 
reachability analysis tool, built on top of the Colorado University Decision Dia- 
gram (CUDD) package The experiments shown here are limited to reacha- 
bility analysis of Finite State Machines, as a first and general framework, unre- 
lated from the verification of specific properties. Our main goal is to prove that 
the sequential behavior of the circuits presented can be analyzed with relevant 
improvements by using Meta-BDDs. Combinational verification as well as BDD 
based SAT checks are other possible applications of Met a-BDDs. 

We presen t data for a few ISCAS’89-addendum benchmarks and some 
other circuits They have different sizes, within the range of circuits man- 

ageable by state-of-the-art reachability analysis techniques. We only report here 
data for the circuits we could traverse with some gain. The benchmark circuits 
we tried without any significant result are: sl269, sl423, sl512, s5378. We argue 
this is mainly due to the fact that no relevant cases of independent or condition- 
ally independent variables are present in the state sets of those circuits. TableJ 



Table 1. Comparing BDDs and Meta-BDDs in reachability analysis. 266 MHz 
Pentium II, memory limit 400 MB, time limit 36000 s. 



Circuit 


FF 


D 


States 


BDDpfe 

[Knodes] 


BDDs 

Mem 

[MB] 


Time (Sift) 

[s] 


BDDpfc 

[Knodes] 


Meta 

Mem 

[MB] 


BDDs 
Time (Sift) 

[s] 


\R\ 


\R^^\ 


53271 


116 


16 




- 


- 


Time-out 


782 


214 


7973 (1190) 


7.7 


s3330 


132 


16 


7.27-10" 


- 


Mem-out 


- 


4534 


356 


22345 (17532) 


4.2 


FIFOs 


142 


63 


5.01-10^^ 


1169 


45 


3691(3232) 


183 


24 


3215 (170) 


13 


queue 


82 


45 


3.4310" 


387 


27 


1873(1750) 


132 


21 


921 (350) 


19 


Rotatorie 


32 


2 


1.00 • 2-’^ 


65 


14 


25 (23) 


12 


5 


1 (0) 


1 


Rotator32 


64 


2 


1.00 • 2“* 


- 


Mem-out 


- 


390 


17 


831 (602) 


1 


Spinnerie 


33 


2 


1.00 • 2^^ 


30 


5 


7 (4) 


7 


4 


2(1) 


1 


Spinner32 


65 


2 


1.00 • 2““ 


- 


Mem-out 


- 


244 


20 


417 (331) 


1 



collects statistics on the circuit used, and the results obtained. For each circuit 
it first shows some common statistics: the number of latches (FF), the sequential 
depth (D), and the number of reached states (States). We then compare traver- 
sals based on the same image heuristic (IWLS95 by Ranjan et al. ^3), with 
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standard BDDs and Meta-BDDs. Since conjunctively partitioned transition re- 
lations are not critical in terms of BDD size, we use Meta-BDDs only for state 
sets (and intermediate product of image computations). For both techniques we 
show peak live BDD nodes (BDDpfc), maximum memory usage (Mem) and CPU 
time (Time) with explicit indication of sifting time. We finally show the ratio 
BDD vs. Meta-BDD size for reachable state sets (|i?|/|i?'^|). 

s3271 and s3330 are known to be hard to traverse circuits, both for time 
and memory costs. FIFO is a freely modified version of the example used in 
whereas queue is a queue model from the NuSMV Q distribution. Rotator 
has two stages. An input register (subscript 16/32 is register size) is fed by 
primary inputs. An output register stores a rotated copy of the inputs register. 
The number of rotated bits is determined by a five bits control input. All states 
are reachable, but image computation is exponentially complex since the early 
quantification scheme pays the dependence of the out put (input) register bits 
from all input register (primary input) bits. Spinner is a similar circuit, 
where the input register can be loaded with the output register, too. These are 
both cases in which conditional independence can be efficently factored out by 
Meta-BDDs, in order to achieve relevant gains in intermediate image steps (even 
though no gains are shown in reachable states). 

In all cases Meta-BDDs were able to “compress” reachable state sets and 
to produce overall improvements. The first two circuits could not be completed 
with standard BDDs in the adopted experimental setup. 

Memory gains are clearly visible from peak BDD nodes, memory usage, and 
reachable state sets size ratio | The overhead introduced to work with the 
decomposed form is visible in the reduced ratio sifting time vs. total time (except 
for the larger example, s3330 where sifting still dominates) and time reductions 
are mainly due to the smaller BDDs involved in computations. 

6 Conclusions and Future Work 

We propose a BDD based decomposition for Boolean functions, which extends 
conjunctive/disjunctive decompositions and may factor out variable indepen- 
dances and/or conditional independances with gains not achievable by standard 
BDDs. 

Our work includes and extends by proposing a representation closed 
under negation and applicable to a wider range of BDD based problems, and by 
exploring non-canonicity in terms of heuristically controlled decomposition and 
simplification steps. Experimental results on benchmark and home made circuits 
show relevant gains agaist standard BDDs in symbolic FSM traversals. Future 
works will investigate heuristics, and application to real verification tasks. 
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Abstract. Formal equivalence verifiers for combinational circuits rely 
heavily on BDD algorithms. However, building monolithic HDDs is often 
not feasible for today’s complex circuits. Thus, to increase the effective- 
ness of BDD-based comparisons, divide-and-conquer strategies based on 
cut-points are applied. Unfortunately, these algorithms may produce false 
negatives. Significant effort must then be spent for determining whether 
the failures are indeed real. In particular, if the design is actually incor- 
rect, many cut-point based algorithms perform very poorly. In this paper 
we present a new algorithm that completely removes the problem of false 
negatives by introducing normalized functions instead of free variables 
at cut points. In addition, this approach handles the propagation of in- 
put assumptions to cut-points, is significantly more accurate in finding 
cut-points, and leads to more efficient counter-example generation for 
incorrect circuits. Although, naively, our algorithm Q would appear to be 
more expensive than traditional cut-point techniques, the empirical data 
on more than 900 complex signals from a recent microprocessor design, 
shows rather the opposite. 



1 Introduction 

The design process of complex VLSI systems can be thought of as a series of 
system-model transformations, leading to the final model that is implemented 
in silicon. In this paper, we concentrate on formal verification techniques estab- 
lishing logic functionality equivalence between circuit models at the RTL and 
schematic levels of abstraction. Traditionally, such techniques operate under the 
assumption that there is a 1-1 correspondence between the state nodes of the 
two circuit models, in effect transforming the problem of sequential circuit verifi- 
cation into one of combinational verification. Therefore, they are able to exploit 

^ US Patents are pending for this algorithm. 
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the power of Reduced Ordered Binary Decision Diagrams P (from now on called 
simply BDDs). Although BDDs are very useful in this domain, they still suffer 
from exponential memory requirements on many of today’s complicated circuits. 

To overcome this, many researchers have investigated alternative solutions 
based on a divide-and-conquer approach. They attempt to partition the speci- 
fication and implementation circuits along frontiers of equivalent signal pairs 
called cut-points. The goal of the overall equivalence verification is now trans- 
formed into one of verifying the resulting sub-circuits. This situation is depicted 
in Fig. Q 




Fig. 1. Circuit Partitioning across Cut-Points. 



The two circuits C\ and C2 compute their outputs {Wi,W2) from their inputs 
(Al, A2). If the BDDs for the outputs as functions of the primary inputs grow 
exponentially in size, one could hope to reduce the complexity of the problem 
by exploiting the fact that internal nodes Yi and I2 of Ci are equivalent to Z\ 
and Z2 of C2, respectively. If this were the case, one could prove the equivalence 
of Cl and C2, by first establishing the equivalence of Cia and C2a and then the 
equivalence of Cif, and C26. It would be expected that potentially the sizes of the 
BDDs that correspond to the sub-circuits Cia, C\b, C2a and C2b are considerably 
smaller than the intractable sizes of Ci and C2, so that the verification of the 
original circuits could complete. The motivation behind such cut-point based 
techniques is the desire to exploit the potentially large numbers of similarities 
between the two circuit models. 

Unfortunately, CP-based techniques suffer from some serious limitations. 
More specifically, when we perform the verification of Cu against C2b in Fig. [Q 
we consider Yi, Y2, and Z2 to be free variables (i.e., they can assume arbi- 
trary boolean values, with Y\ = Z\ and Y2 = Z2). This can lead to problems in 
the verification of Cu and C26. For example, consider the circuits in Fig .|21 

Let us assume that we could prove the equivalence of Yi and Y2 to Z\ and 
Z2 respectively. Then if we introduced the same free variable A for (Yi, Zi) and 
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Fig. 2. False negative as a result of the free variables at the cut-points variables 
(Fi.Zi) and (Y2,Z2). 

B for (>2,^2), we would compute that C\ calculates W = A Q B, while C2 
calculates W = A + B. This does not allow us to conclude that the two circuits 
are equivalent. However, this is actually a false negative. The reason is that, due 
to the nature of the logic that generates them, Yi and I2 cannot be both 1 at the 
same time, i.e., they are mutually exclusive. The same is true for Zi and Z2- 
As a result, the two circuits actually produce the same W functions, because for 
mutually exclusive input signals, XOR and OR gates produce the same results. 

Given this problem, cut-point based verification algorithms usually perform 
the following operations: 

1 . Discover as many cut-points as possible, hoping to produce the smallest 
possible sub-circuits whose functions will be computed and compared with 
HDDs (cp-identification). 

2 . Choose out of these cut-points the ones that (based on various criteria) 
simplify the task of sub-circuit verification (cp-selection). 

3 . Perform the actual verification of the resulting sub-circuits (eq-check), and 

4 . Attempt to determine whether the corresponding circuit outputs that appear 
as inequivalent, are truly different or the algorithm produced false negatives 
(fnr, false negative reduction). 

A comprehensive review of the existing cut-point based algorithms appears 
in |H]. For cp-identification traditionally random simulation, automatic test 
pattern generation (ATPG) or BDD-based techniques are employed. Out of 
the cut-points so identified, some are rejected in the cp-selection stage, according 
to various criteria that are outside the scope of this paper. The remaining are 
used to form the boundaries of the sub-circuits to be verified. Then the resulting 
sub-circuits are verified independently, most frequently with the use of HDDs. 
If all these sub-circuits are verified equal, then the original circuits are equal. 
Nevertheless, as we have showed in Fig. 0 the cut-point based algorithms can 
indicate that the circuits are different as a result of false negatives. 



134 



John Moondanos et al. 



Thus, the presently known cut-point algorithms perform a final stage of false 
negative reduction (fnr) . One method that is employed is that of re-substitution 
of the cut-point functions 0. In the example of Fig. |21 we have for C\ that 
W = (Yi (B Y 2 ) and for C 2 that W = (Yi + Y 2 ) given that Yi = Z\ and Y 2 = Z 2 - 
Although, the two circuits appear to calculate different outputs based on cut- 
points, if we compose into the expressions for W the functions of Yi and Y 2 , 
we prove circuit equivalence. The main difficulties with this technique are that 
in the worst case we might have to compute the entire function of a circuit’s 
output, which may be the very problem that we attempted to avoid by using 
the cut-points algorithm. The method presented in |E| is based on maintaining 
multiple cut-point frontiers from which to choose the functions to be composed 
into the output functions, with the hope that some of the frontiers will lead to 
the desired results. 

Other false negative reduction techniques 0 are based on the idea of main- 
taining the set of values that the cut-points are allowed to assume. Again for 
the case of the circuit of Fig.|2|, we can see that the cut-point variables (Yi,Y 2 ) 
or (^ 1 ,^ 2 ) can belong only to the set {(0, 0), (0, 1), (1, 0)} since they are mu- 
tually exclusive. Such sets are encoded by BDDs and are used to restrict the 
comparisons of circuit node functions within the regions of allowed cut-point 
values. Unfortunately, maintaining and propagating these sets are often very 
difficult and computationally expensive. One other approach to the problem of 
false negative reduction is based on Automatic Test Pattern Generation tech- 
niques (ATPG) as in [S|. There, for each value combination of the cut-points that 
causes the circuit outputs to mismatch, they attempt to find (using an ATPG 
tool) the input pattern that can generate the cut-point value combination in 
question. The drawback of this algorithm is that one must call the ATPG tool 
for each cut-point value combination that causes a mismatch, until one deter- 
mines whether the mismatch is a false negative or a real design problem. This 
can be time-consuming for a large number of cut-point value combinations. 

The fundamental limitation of these approaches is that they fail to identify 
the cause of the false negatives. In this paper we identify the cause and we 
design an algorithm that allows us to eliminate false negatives early during the 
cp-identification stage of the algorithm. As the careful reader will realize this 
leads to a simpler algorithm, since we do not perform a final false negative 
reduction stage with potentially very expensive BDD operations. 



2 Cut-Point Algorithm in CLEVER 

2.1 Avoiding False-Negative Generation 

Through the example we presented in the previous section, one can understand 
that the key reason for false negatives is the fact that not all value combinations 
can appear at the nodes of a cut-point frontier. In this section we restate the 
following observation from |0| which forms the basis for the development of our 
false negative elimination approach. 
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Proposition 1. In a minimized circuit without re-convergent fanout, verifica- 
tion algorithms based on cut-point frontiers cannot produce false negative results. 

Proof. {Sketch only) Intuitively, if there are no re-convergent fanouts then each 
cut-point can he completely controlled by its fanin cone. Additionally if the circuit 
is minimized it cannot have constant nodes. As a consequence all possible value 
combinations can appear at the signals of any cut-frontier, and therefore false 
negatives cannot happen. □ 

Proposition 1 identifies the reason for the false negative result. Clearly, the 
reconvergent fanout of node X 2 makes Yi and I 2 to be correlated. As a result 
not all possible value combinations can appear on Yi and Y 2 , as implied by 
the introduction of free variables by some cut-point based techniques. A more 
sophisticated algorithm should not assign free variables on Y\ and Y 2 , since this 
will not allow the W signals in the two circuits to be identified as equal. Here 
we present an alternative approach, where instead of assigning a free variable 
to a cut-point, we assign a function that captures the correlation between cut- 
points. To make the HDD representation of this function as small as possible, 
we attempt to exclude from it the effect of the non-reconverging signals in the 
support of the cut-point, based on Proposition 1. 

Let V be a cut-point that will be used to calculate additional cut-point 
frontiers. The logic gates from the previous cut-point frontier that generate V, 
implement a logic function g{r,n). The variables r and n correspond to cut- 
points from the previous frontier. However, here, we have partitioned all these 
variables into two groups. The r variables correspond to cut-points with recon- 
vergent fanout that leads outside of the cone of the signal V. On the other hand, 
the n variables correspond to cut-points that do not have fanout re-converging 
outside the cone of V. The goal here is to capture the effect of the re-converging 
signals on V, so that the r variables and the free variable we introduce for V 
do not assume incompatible values. We hope that by doing this, we can avoid 
introducing false negatives in the process of calculating the cut-points belonging 
to the next frontier. 

Now, to capture the relationship between the r variables and signal V, we 
examine the situations where they force V to assume a specific value, either 1 
or 0. The r signal values that can force V to be 1 are the ones for which, for 
every value combination of the n variables , V becomes 1. These values of r are 
captured by universally quantifying out the n variables from g{r,n) as in the 
following function Fg : 

Pg = Pgi'^) = yn.g{r,n) 

Subsequently, we call this function the forced term to be intuitively re- 
minded of its meaning in the context of cut-points algorithms (although other 
naming conventions exist in the literature). Now, let us examine when the V 
signal equals 0. This happens for all those r values that make g{r,n) = 0 re- 
gardless of the values of the n signals. So V should be 0 whenever the following 
function Pg is 0: 



Pg = Pgi'^) = 3n.g(r,n) 
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This function is the result of existentially quantifying out the n variables 
from g{r,n) and will subsequently be called the possible term. Thus, if for 
a given value combination of the r variables, all the possible combinations of 
the n variables make g{r,n) = 0 , then the free variable assigned to V in CP- 
based algorithms must obtain only the 0 value. Otherwise, we will have to cope 
with the potential appearance of false negatives. On the other hand, if some of 
the combinations of the n variables make g{r,n) = 0 and some others make it 
g(r, n) = 1 for a specific r variable value combination, then V can assume either 
0 or 1 independently from the r variables. 

From this discussion it becomes apparent that a more appropriate assignment 
to the V signal for the calculation of the cut-points in the next frontier and the 
avoidance of false negatives is: 

vPg + Fg = v.{3n.g{r, n)) + {'in.g{r, n)) ( 1 ) 

We call expression JIJ) the normalized function for the signal V, as opposed 
to the free variable that is assigned to it in other implementations of cut-point 
based algorithms. We also call the variable v that we introduce for the signal V 
the eigenvariable of V. To illustrate the use of normalized functions we consider 
the circuit of Fig. |21 X2 has reconvergent fanout that affects both Yi and ¥2- The 
possible term for Yi is X2, while the forced term is 0 . Therefore, the normalized 
function for Yi is V1.X2 -I- 0 = Similarly, for ¥2 the possible term is X2, 

while the forced term is 0 . So, ¥2 gets a normalized function of: ^2 W2-I-O = V2-X2- 
Now, signals Z\ and Z2 of C2 get the same normalized functions as Yi and ¥2 
respectively, since they implement the same functions in C2 as their counterparts 
in Cl. So, the function for W in Ci becomes V\.X2(BV2-X2, while in C2 we have 
that W implements vi.X'2 + V2-X2- These two expressions are clearly equal since 
the two terms comprising them cannot be 1 at the same time. One can prove 
that the use of normalized functions solves the problem of false negatives in 
the general case as well. This is based on the fact that the range of a vector 
of Boolean functions is identical to the range of the corresponding vector of 
normalized functions. A rigorous proof of this claim appears in Appendix A. 

2.2 Cut-Point Algorithm Implementation 

For the comparison of the functions implemented by nodes and Ui of the 
specification (spec) and the implementation (imp) models, we set the cut-point 
frontier to be initially the primary inputs of the cones that are driving the two 
signals. Then repeatedly we attempt to identify more cut-points lying ahead 
of the current cut-point frontier closer to and Ui with the hope that even- 
tually we will build the BDDs for these two nodes, so that we can compare 
them for equivalence. These BDDs are built not by assigning free variables for 
the cut-points on the present frontier but by assigning to every cut-point its 
corresponding normalized function. 

As we see, our CLEVER cut-point based algorithm departs from the clas- 
sical approach by combining the cp-identification and false negative reduc- 
tion phases. The main benefit with respect to pre-existing algorithms is that 
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CLEVER correctly identifies the root cause of false negatives and tries to avoid 
creating them, so that it does not perform expensive operations to correct them. 
Furthermore, employing normalized functions allows us to correctly identify all 
internal signals of equal functionality that exist within the cones of the signals 
being compared. False negative elimination is usually not done for internal cut- 
points, and previous algorithms fail to identify every pair of sub-circuits with 
identical functionality. Thus, the algorithm presented here has the opportunity 
to work on smaller verification sub-problems, since it identifies all possible cut- 
points. In addition we do not have to perform any circuit transformations to 
increase the similarity of the circuit graphs. 

One additional area where the presented approach with the use of normalized 
functions is fundamentally different from previous techniques is the generation 
of counter examples and the debugging of faulty circuits. In methods like the 
ones in 0, 0, when the outputs are not proven equal based on the last 

frontier, one does not know whether this is due to a false negative or a real circuit 
bug. Here, we do not have to perform a false negative elimination step, since we 
know that the difference of the outputs must always be due to the presence of a 
bug. In contrast to other algorithms that require the resubstitution of the cut- 
point variables by their driving functions, when we employ normalized functions 
there exists an efficient and simple algorithm that does not require large amounts 
of memory. 

The validity of this algorithm is based again on the theory of Appendix A, 
where we show that the range of a function is identical to the range of its 
normalized version. Intuitively, the counter-example that we can produce for the 
outputs based on the signals of the last frontier will be in terms of values of 
eigenvariables and reconverging primary inputs. The goal is to use these values 
of eigenvariables and reconverging inputs to compute the corresponding values of 
the non-reconverging primary inputs. These must be computed to be compatible 
with the internal signal values implied by the cut-point assignment that was 
selected to expose the difference of the outputs. 

Finally, one additional area where our cut-point based techniques can be 
contrasted with pre-existing approaches is the area of input assumption han- 
dling. This topic is not usually treated in publications of cut-point based algo- 
rithms. However, in our experience logic models of modern designs contain many 
assumptions on input signal behavior, without which the task of formal equiv- 
alence verification is impossible. In the case of our cut-point based algorithms 
we employ parametric representations to encode input assumptions as described 
in m- Normalized functions are ideally suited to capture the effect of boolean 
constraints on the inputs. This is the case because the validity of our algorithm 
still holds if it is invoked on parametric variables encoding the inputs of a cir- 
cuit rather than the actual inputs themselves. One could even argue that the 
normalized functions are a parametric representation of the function driving a 
cut-point in terms of its eigenvariable and its reconverging input signals. 
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3 Results 

The algorithms that are presented in this paper were developed to enable the 
equivalence verification of a set of difficult to verify signals from a next-generation 
microprocessor design. The results of the application of CLEVER on these com- 
plex circuit cones appear in Table Q Note that this table lists results for the 
RTL to schematic netlist comparison. In Table ^ the numbers of the problem- 
atic signals for each circuit appear in the Prb column. The term problematic 
here refers to signals whose comparison was not possible by means of monolithic 
BDDs even though all possible ordering heuristics were exhausted (static and 
dynamic ordering). The column IPs lists the average number of inputs per sig- 
nal cone. The next six columns of the table are partitioned into two sections, 
one for the SPEC model (the RTL) and one for the IMP (the transistor netlist). 
The Comp column refers to the average size in composite gates of the cones 
of the problematic signals. The CP% column lists the percentage of nodes in a 
model that are found by the classic cut-point algorithm to have nodes of iden- 
tical functionality in the other model. Similarly, the NCP% column lists the 
percentage of nodes in a model that are found by the cut-point algorithm with 
normalized functions to have corresponding nodes of identical functionality in 
the other model. Finally, the CP and NCP columns list how many signal com- 
parisons were completed by the classic cut-point algorithm with resubstitution 
and the cut-point algorithm with normalized functions, respectively. 



Table 1. Statistics about the logic cones driving various problematic signals. 



Ckt 


Prb 


IPs 


SPEC (RTL) 


IMP (Netlist) 


CP 


NCP 


Comp 
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NCP% 
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NCP% 
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343 
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60% 
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56% 
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980 


25% 


69% 


570 


24% 


39% 


0 


8 


C4 


96 


130 


1040 


72% 


84% 


410 


48% 


51% 


76 


96 


Cs 


15 


260 


810 


20% 


60% 


650 


25% 


45% 


0 


15 



One important detail becomes immediately evident from Table E The use 
of normalized functions in our cut-points techniques helps us identify a higher 
number of cut-points than the classic algorithm. This is the case because our cut- 
point based techniques do not produce false negatives. Clearly we identify more 
cut-points in the RTL model, but the difference becomes much more dramatic 
in the case of the logic model for the transistor netlist. This is to be expected 
because the logic model for the transistor netlist is more compact, since it is 
coming from the minimized model for the circuit implementation and as a result 
has many nodes with reconverging fanout. These nodes cause the classic cut- 
point algorithm to produce many false-negatives and fail to correctly identify 
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many cut-points. As a result, the cut-point algorithm with normalized functions 
manages to complete the verification of approximately 200 more signals out 
of 900. In addition the C4 netlist contained 14 output signals which initially 
were not equivalent with their specification models. These signals were debugged 
using Algorithm 2, since the classic algorithm with BDD resubstitution could 
not handle them. 

The plot in Fig. Elindicates the time requirements (in cpu sec) for the signal 
comparisons on HP Unix workstations with 512MB of RAM. The horizontal 
axis corresponds to the time it takes the classic cut-point algorithm (CP) for the 
comparisons of the signals in Table ^ The vertical axis corresponds to the time 
it takes the cut-points algorithm with normalized functions (NCP) for the same 
comparisons. The 200 signal comparisons for which the CP algorithm timed out 
are arbitrarily placed at the bottom right of the plot only to indicate the time 
it took the NCP algorithm to finish them. The diagonal line partitions the plot 
into two areas indicating which algorithm performed better. 




CP CPU sec 



Fig. 3. Time Comparison of Cut-Points with Resubstitution and Cut-Points 
with 

Normalized Functions. 



Figure EJshows the memory requirements (in MB) for the signal comparisons 
from Table Q] The dashed lines correspond to the classic cut-points algorithm 
and the dots to the one with normalized functions. Signals are sorted accord- 
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ing to increasing verification time and we can identify the point where memory 
requirements become exponential for the classic cut-points algorithm. The key 
observation here is that cut-point techniques with normalized functions require 
constant amounts of memory for a larger number of signals. This happens be- 
cause normalized functions detect every possible cut-point, thus resulting in 
smaller verification problems and more controlled memory requirements. The 
second observation is that the classic cut-point algorithm with BDD resubsti- 
tution requires more memory. This is happening for two reasons. First, it may 
not produce a cut-point frontier close to the output because of failing to identify 
cut-points due to false negatives. So it would create bigger BDDs for the circuit 
output. The second reason is that if the outputs could not be proven equal, the 
classic cut-point algorithm needs to perform BDD resubstitution. This is neces- 
sary to determine whether the signal in-equivalence is real or a false negative. As 
a result the memory requirements for the BDDs that get created are increased. 




Fig. 4. Memory Consumption Comparison between Cut-Points and Cut-Points 
with Normalized Functions. 



4 Summary 

CLEVER is a tool for the formal equivalence verification of next generation 
microprocessor designs and employs a number of different engines for the ver- 
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ification of combinational circuits. Among them, one is based on BDD tech- 
nology using the state of the art algorithms for monolithic BDDs. As it is well 
known monolithic BDDs suffer from certain limitations. For this reason CLEVER 
employs circuit divide and conquer techniques based on BDDs. In this paper, 
we have presented the main idea behind the divide and conquer algorithm in 
CLEVER. This algorithm is based on the concept of function normalization to 
provide an efficient means for avoiding the “false negatives problem” that ap- 
pears in other combinational verification techniques based on circuit partitioning. 
In addition function normalization readily lends itself to simple counter-example 
generation and comprehensive handling of input assumptions. As a result, we are 
able to apply divide and conquer techniques for the comparison of complicated 
combinational signals, even in cases where the degree of similarity between the 
circuit models is limited between 20% to 30%. 
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Appendix A: Proofs 

Let B = {0, 1} and i? be a subset of Now R contains vectors of the form 
s =< si, S 2 , ■ ■ ■ , Sk >■ Traditionally, such a set is represented by its characteristic 
function 3?(s) = 3?(si, S 2 , . . . , Sk) which becomes 1 iff s S i?. If we consider the 
variables Si to model signal values of a logic circuit model, we can view any 
set 3?(s) = 3?(si, S 2 , . . . , Sfe) as a signal relation. If there is actually no relation 
between the signals, then 3?(s) = 1. Also, let G(s) =< gi{s),g 2 {s), . . . ,gk{s) > 
be a Vector Boolean Function, where s =< si, S 2 , ■ ■ ■ , Sm > are the function 
inputs. Each gi{s) can be written as gi{ri,rii), where: 
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~ Vi are the variables on which some other gj,j ^ i depends. These will be 
called the re-converging variables. 

— rii are the variables on which no other gj,j ^ i depends. These will be called 
the non-reconverging variables. 

Now let us re-introduce the concepts of Forced and Possible Terms of Boolean 
Vector Functions. Let {a;} stand for all possible value combinations of x =< 
xi,X 2 , ■ ■ ■ ,Xm > in = {0, 1}™. There are 2™ such combinations. 

For gi{ri,rii) we define its Possible Term Pg. as: 

Pgi = PgiiXi) = Tij) (2) 

and its Forced Term Fg. as: 

Fgi = Fgiiri) ~ ( 3 ) 

The following lemmas follow directly from the properties of existential and uni- 
versal quantification. 

Lemma 1. 

PgA'^i) = 0 ^ Fg^(ri) = 0 

Lemma 2. 

Fgii't'i) = 1 PgA'^i) = 1 

Also, let dG =< dgi,dg 2 , ■ . ■ , dgk > stand for the Normalized Function of 
G, where G is a boolean vector function. More specifically, let us define dG as 

< dgi{ri,ni),dg2{ri,rii), . . . ,dgk{ri,ni) > 

where 

dgi{ri,rii) = Vt.Pg^{ri) + Fg^{ri) 

The variable Vi is a free Boolean variable, and is called the eigenvariable of gi. 

Also, let [G] stand for the Range of the vector Boolean Function G{s) =< 
gi{s),g 2 {s), . . . , gk{s) >. Then [G] is defined as: 

[G]={beB>^\3s:b=<g^{s),g2{s),...,gk{s) >} 

where s =< si, S2, ■ ■ ■ ,Sm >G F™, and b —< 6i, 62, ■ ■ ■ ,bk >G B'^. 

The main result of our algorithm is captured in the following theorem. 

Theorem 1. The Range of a function G(s) is identical to the range of its 
normalized function dG{s), 



[G(s)] ^ [dG{s)] 
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We repeat that dgi is a function of the eigenvariable Vi of gi and its recon- 
verging variables rj, i.e. dgt = dgi{vi,ri). 

Similarly, we can say dG = dG{v, r) where v =< vi,V 2 , ■ ■ ■ ,Vk > the eigen- 
variables of gi, <72 j • ■ • j fffc) and 

r =< Ti, T2, . . . , r/c > are the reconverging variables of gi, g 2 , ■ ■ . , gk, respec- 
tively. 

Keep in mind G has k functions of at most m variables each. 

We will attempt now to prove Theorem ^ 

Proof: Initially we will prove that [G] C [dG]. 

Let b =< 6i, 62, ■ ■ ■ ,bk >€ [G], where G {0, 1}. 

3s = (s^ , S 2 , . . . , : bi = gi{si, S 2 , ■ ■ ■ , j * ~ 1, . . . , fc 

^ 3r^, derived from s' : b^ = gi{r'^, n^), i = 

Now, if bi = 1 ^ = 1 ^ 1 according to formula 0 If 

we select the eigenvariable value t)' = 1 we get that dgi{v'i,r'^) = v'i.Pg.{r'^) -|- 
FgM) = ^■^ + F9M) = ^ = b^ 

On the other hand, if 5^ = 0 gi(r^, n<) = 0 Fg.{r'^) = 0 according to 

formula 0 If we select the eigenvariable value u' = 0 we get that dgi{v'i,r'^) = 
v'-Pg.{r!i) + Fg,{rl) = O.PgM) +0 = 0 = 6. 

So, for the bit pattern b =< 61, 62 , . . . , 6fc >G [G], we have created a pattern 
< dgi{v'i,r'^ >,..., r^) >G [9G], which is identical to b. So, b G [9G] 

and [G] C [dG]. 

To complete the proof of Theorem 0 we also need to prove: [dG] C [G] 

To prove this, let b =< 61, 62 , . . . ,bk >G [dG]. 

=> 3{v' , r') : b — dG{v', r') 

=> : bi = %(u',r') = v'i.Pg,{r'^) + Fg^{r'^),i = l,...,fc. 

Now, in case bi — 0. Then bi = dgi{v'i,r'^) = 0 Pgii^i) = 0 
^ yrii.g,{r'^, m) = 0 ^ 3n'^ : gi(rl, n^) = 0 
^ 3n' : g^{r'^,n'^) = 0 = h 

On the other hand, if 6^ = 1 => PgX'r'i) = 1. Otherwise, if Pg^ivi) = 0 
Pgii'^'i) — 0 according to the lemmas presented previously. This would make 
Vi.Pg. + Fg^ = 0 while we assumed it’s a 1. 

But, if Pg.(r') = 1 ^ 3ni.g,{r'^,n'i_) = 1 ^ 3n'. : gi{r'i,n'^) = 1 = 6^. 

So, for b =< 61, 62 , . . . , 6fe >G [5G], we can construct another bit pattern < 

. . . ,g.(r' ,n^) >=< 61,62 , . . . ,6fc >, which G [G]. Therefore, [dG] C 
[G], which completes our proof. □ 

To establish the false-negative elimination claim, consider now the function 
G that is computed by forming the exclusive-OR of every pair of outputs from 
the two circuits to be compared. Since the range of dG is equal to the range of 
G, our claim follows trivially. 
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Abstract. We introduce a decision procedure for satisfiability of equiv- 
alence logic formulas with uninterpreted functions and predicates. In a 
previous work ([PRSS99]) we presented a decision procedure for this 
problem which started by reducing the formula into a formula in equal- 
ity logic. As a second step, the formula structure was analyzed in order 
to derive a small range of values for each variable that is sufficient for 
preserving the formula’s satisfiability. Then, a standard BDD based tool 
was used in order to check the formula under the new small domain. In 
this paper we change the reduction method and perform a more careful 
analysis of the formula, which results in significantly smaller domains. 
Both theoretical and experimental results show that the new method is 
superior to the previous one and to the method suggested in [BGV99]. 



1 Introduction 

Deciding equivalence between formulas with uninterpreted functions is of major 
importance due to the broad use of uninterpreted functions in abstraction. Such 
abstraction can be used, for example, when checking a control property of a 
microprocessor, and it is sufficient to specify that the operations which the ALU 
performs are functions, rather than specifying what these operations are. Thus, 
by representing the ALU as an uninterpreted function, the verification process 
avoids the complexity of the ALU. This is the approach taken, for example, in 
[BD94], where a formula with uninterpreted functions is generated, such that its 
validity implies the equivalence between the CPU checked and another version 
of it, without a pipeline. Another example is given in [PSS98], where formulas 
with uninterpreted functions are used for translation validation, a process in 
which the correct translation of a compiler is verified by proving the equivalence 
between the source and target codes after each run. 

In the past few years several different BDD-based procedures for checking 
satisfiability of such formulas have been suggested (in contrast to earlier de- 
cision procedures that are based on computing congruence closure [BDL96] in 
combination with case splitting). Typically the first step of these procedures is 
the reduction of the original formula ip to an equality formula (a propositional 
formula plus the equality sign) ip such that ip is satisfiable iff p is. As a second 
step, different procedures can be used for checking ip. 
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Goel et al. suggest in [GSZAS98] to replace all comparisons in ip with new 
Boolean variables, and thus create a new Boolean formula ip' . The BDD of ip' is 
calculated ignoring the transitivity constraints of comparisons. They then tra- 
verse the BDD, searching for a satisfying assignment that will also satisfy these 
constraints. Bryant et al. at [BVOO] suggested to avoid this potentially exponen- 
tial traversing algorithm by explicitly computing a small set of constraints that 
are sufficient for preserving the transitivity constraints of equality. By checking 
Ip' conjoined with these constraints using a regular BDD package they were able 
to verify larger designs. 

In [PRSS99] we suggested a method in which the Ackermann reduction 
scheme [Ack54] is used to derive ip, and then ip’s satisfiability is decided by 
assigning a small domain for each variable, such that ip is satisfiable if and only 
if it is satisfiable under this small domain. To find this domain, the equalities in 
the formula are represented as a graph, where the nodes are the variables and the 
edges are the equalities and disequalities {disequality standing for yf) in ip. Given 
this graph, a heuristic called range allocation is used in order to compute a small 
set of values for each variable. To complete the process, a standard BDD based 
tool is used to check satisfiability of the formula under the computed domain. 

While both [PRSS99] and [GSZAS98] methods can be applied to any equality 
formula, Bryant et al. suggest in [BGV99] to examine the structure of the original 
formula if. They prove that if the original formula ip uses comparisons between 
variables and functions only in a certain syntactically restricted way (denoted 
positive equality) , the domain of the reduced formula can be restricted to a unique 
single constant for each variable. This result can also be applied for only subsets 
of variables (and functions) in the formula that satisfy this condition. However, 
this result cannot be obtained using Ackermann’s reduction. Rather they use 
the reduction proposed in [BV98]. 

The method which we propose in this paper roughly uses the framework we 
suggested in [PRSS99]. We will use the reduction scheme suggested in [BV98] 
(rather than Ackermann’s scheme) in order to generalize their result in the case 
of positive equality formulas. We also show how this shift, together with a more 
careful analysis of the formula structure, allows for a construction of a different 
graph, which results in a provably smaller domain. The smaller implied state 
space is crucial, as our experiments have shown, for reducing the verification 
time of these formulas. 

2 Preliminaries and Definitions 

We define the logic of equality with uninterpreted functions formally. The syntax 
of this logic is defined as follows: 

(Formula) < — (Boolean-Variable) \ 

(Predicate-Symbol)((Term ) , . . . , (Term)) \ 

(Term) = (Term) \ ^(Formula) \ (Formula) V (Formula) 

(Term) < — (Term-Variable) \ 

(Fundion- Symbol) ((Term), . . . , (Term)) \ 
lT'E((Formula) , (Term), (Term)) 
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We refer to formulas in this logic as UF-formulas. We say that a UF-formula Lp 
is satisfiable iff there is some interpretation of the variables, functions and 
predicates of tp, such that M\= p. 

An equivalence logic formula (denoted E-formula) is a UF-formula that does 
not contain any function and predicate symbols. Throughout the paper we use 
p and 4) to denote UF-formulas and E- formulas, respectively. 

We allow our formulas to contain let constructs of the form let X = ip in 
p{X), which allows term sharing or the representation of circuits. 

For simplicity of presentation, we will treat UF-formulas with no Boolean 
variables and predicates. Also, we will assume there are no ITE terms, and every 
uninterpreted function has just one argument. All these extensions, including the 
full proofs and examples, are handled in the full version of the paper [PRSOl]. 

3 Deciding Satisfiability of E-Formulas 

We wish to check the satisfiability of an E- formula ip with variables V . In theory 
this implies that we need to check whether there exist some instantiation of V 
that satisfies ip. Since ip only queries equalities on the variables in U, it enjoys 
the small model property, which means that it is satisfiable iff it is satisfiable over 
a finite domain. It is not hard to see that the finite domain implied by letting 
each variable in V the range over {1 . . . |U|} is sufficient. However, this approach 
is not very practical, since it leads to a state space of 

In [PRSS99] we suggested a more refined analysis, where rather than con- 
sidering only \V\, we examine the actual structure of ip, i.e. the equalities and 
disequalities in ip. This analysis enables the derivation of a state space which 
is empirically much smaller than |U|I^L In this section we repeat the essential 
definitions from this work, except for several changes which are necessary for the 
new techniques that will be presented in later sections. 

Definition 1. (E-Graphs): An E-graph Q is a triplet Q = (V,EQ,DQ), where 
V is the set of vertices, and EQ (Equality edges) and DQ (Disequality edges) 
are sets of unordered pairs of vertices. 

Given an E-graph Q = {V, EQ, DQ), we let V{Q) = V, DQ{Q) = DQ and 
EQ{Q) = EQ. We use < to denote the sub-graph relation: H < G iS V{H) = 
V{G), EQ{H) C EQ{G) and DQ{H) C DQ{Q). We will use E-graphs to repre- 
sent partial information derived from the structure of a given E-formula; they 
can be viewed as a conservative abstraction of E-formulas. 

We say that an assignment a (assigning values to the variables in V) satisfies 
edge (a, b) if (a, b) is an equality edge and a(a) = a{b), or if (a, b) is a disequality 
edge and a{a) ^ a{b). We write a \= Q if a satisfies all edges of Q. Q is said to 
be satisfiable if there exists some a such that a \= Q. 



Constrnction of E-Graph G{ip)'. For an E-formula ip we construct the E- 
graph G{ip) (this is a construction suggested in [PRSS99]) by placing a node in 
Gif’) for each variable of ip, and a (dis)equality edge for each (dis)equality term 
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of ■0 — by “equality” term we mean that the equality term appears under an 
even number of negations, and by “disequality” , under an odd number. 

Example 1. The E- formula = (a = 6) A (^(c = b) V {a = c)), results in the 

E-graph: 

= {{a, b, c}, {{a, b), {a, c)}, {(c, b)}) 

Notice that every proper subgraph of is satisfiable. 

The important property of Gi^P) is that any two assignments a\ and «2 that 
satisfy exactly the same edges of GO’), will give the same result for 0; i.e., 
ai ^ -0 iff «2 \= Ip- This means that if 0 is satisfiable, then there is some 
satisfiable H < GO’) such that every assignment that satisfies all edges of H 
will satisfy ip (this H consists of all the edges of G{ip) that are satisfied by 0’s 
satisfying assignment). We wish to generalize this property of G{ip)- 

Definition 2. (Adequacy of E-Graphs to E-Formulas): An E-graph G is ade- 
quate for E-formula ip, if either ip is not satisfiable, or there exists a satisfiable 
n < G such that for every assignment a such that a \=7i, a ^ 0. 

For example, G{ip) is adequate for ip. We use the fact that an E-graph is adequate 
for Ip for finding a small set of assignments that will be sufficient for checking ip: 

Definition 3. (Adequacy of Assignment Sets to E-Graphs): Given an E-graph 
G, and R, a set of assignments to V(G), we say that R is adequate for G if for 
every satisfiable H < G, there is an assignment a G R such that a\= H. 

Proposition 1. If E-graph G is adequate for ip, and assignment set R is ade- 
quate for G, then ip is satisfiable iff there is a G R such that a \= ip. 

Example 2. For our E-formula ipi of Example 1, the following set is adequate 
for 0(0i): 

R= {(a^ 0,5^ 0,c<— 0),(a^ 0,6^ 0,c^ l),(a^ 0,6^ l,c^ 0)} 

Indeed, the assignment (a ^ 0, 6 ^ 0, c ^ 0) G R, satisfies ip\. 

The range allocation procedure of [PRSS99] calculates an adequate assignment 
set R for a given input E-graph G- In that procedure, the resulting R has an extra 
property: every a G R 'ls diverse w.r.t. G- By this we mean that for every u,v G 
V{G), if u and v are not connected via equality edges in G, then a{u) yf a{v). In 
[PRSOI] we show how to alter any range allocator so that its output assignment 
set will be diverse w.r.t. the input E-graph (while retaining adequacy), without 
increasing the assignment set size. In light of this, we alter Definition 2 and 
Definition 3, by considering only assignments that are diverse w.r.t. to G (replace 
“assignment” by “assignment that is diverse w.r.t. 0” in both these definitions). 
This leaves Proposition 1 true, does not cause an increase in the size of the 
possible adequate assignment sets (as we just commented), and makes it easier 
for us to find an adequate E-graph for a given E-formula. 

We will now rephrase the decision procedure for the satisfiability of UF- 
formulas as suggested in [PRSS99] according to the above definitions: 
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1. Reduce UF-formula (p to E-formula ip using Ackermann’s reduction. 

2. Calculate the E-graph Q{ip). 

3. Calculate an adequate set of assignments R for Q{ip). 

4. Check if any of the assignments in R satisfies ip. (This step is done symbol- 
ically, not by exhaustive search of R). 

In this paper we alter Steps 1 and 2 of this procedure by replacing the reduction 
scheme, and by calculating a different adequate E-graph for ip. We will later 
show that these changes guarantee smaller state spaces and thus a more efficient 
procedure. 



4 Bryant et al. Reduction Method 



We will denote this type of reduction of a UF-formula p to an E- formula ip by 
The main property of is that it is satisfiable iff (p is satisfiable. 

The formula (p) is given by replacing for all i, the function application Fi 
in by a new term Fp. We explain the reduction using an example (see [PRSOl] 
or [BV98] for details): 

Example 3. Consider the following formula: 



:= [F{F{F{y))) ^ F{F{y))] A [F(F(y)) ^ F{x)] A[x = F{y)] 

We number the function applications such that applications with syntactically 
equivalent arguments are given the same index number: 



VI ■■= [F4(F3(Fi(y))) ^ F 3 (Fi(y))] A mF,{y)) ^ F^ix)] A[x = Fi(y)] 
{pi) is given by: 



T^^ipi) := {FI ^ F*) A (F* ^ F*) A (a: = F*) 



jl 




Ff:= 


1 


r/i 


Ff = y; 




II 


1 /2 
/a 


II II 


II 


1 


1/4 


Otherwise; 







x = y; 
Otherwise; 




Otherwise; 



The general idea is that for every function application Fj of p we define a new 
variable fj which is the “basic” value of Fj. This means that F* = fj if no 
smaller (index wise) function application “overrides” fj . This can happen, when 
there is some i < j such that the argument of Fi and Fj are equal. In this case, 
for the minimal such i, we have F* = fi. 

In comparison, Ackermann’s reduction for pi is given by T^{pi): 



(y = x ^ /i = / 2 ) A (y = /i ^ /i 

(y = /s ^ /i = fi) A{x = fi ^ h 

{x = h = fi) A (/i = fs^ fs 



/a) A 
/a) A 

fi) 



A(/4 /a)A(/3 f 2 )A{x 



fi) 



A hint to why Bryant’s reduction is better for our purposes is the following 
claim: 
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Claim. For every UF-formula a \= T^{ip) then a ^ (ip). 

While the converse does not hold. Thus, T^^{ip) has more satisfying assignments 
and therefore it should be easier to satisfy. 

5 New E-Graph Construction 

Given a UF-formula <p, we wish to construct a minimal E-graph that will be 
adequate for We will first try to disregard all function arguments. 

Denote by simp{ip) the E-formula received by replacing every function appli- 
cation Fi by its corresponding variable fi. For example, for ipi of Section 4, 
simp{(fii) = ((/4 ^ /a) A (/a ^ /2) A{x = fi)). Our initial E-graph will therefore 
be Q{simp{(p)). 

If we take for example (p 2 = Fi{x) ^ ^ 2 ( 2 /), then simp{(p 2 ) = fi ^ H- 
G{simp{(p 2 )) then contains just one disequality edge between fi and f 2 - An ade- 
quate assignment set for G {simp{(p 2 )) , must contain an assignment a that assigns 
a different value for every variable in the E-graph, since a should be diverse w.r.t. 
to G {simp{(p 2 )) ■ For example: a(/i) = 0,a(/2) = l,a(a;) = 2,a{y) = 3. Since 
T^^{(fi2) = f\ + ITE(x = y, /i, / 2 ), we get that a ^ T^^{ip2)- And so we 
found an assignment that satisfies the formula. 

Assume however, that our formula is slightly different: ps = Fi(x) ^ ^ 2 ( 2 /) A 
((a; = y)y True)^. In this case simp^ps) = /i yf /2 A ((x = y) V True). Now, 
G{simp{(p 3 )) will also contain an equality edge between x and y. In this case, a 
possible adequate assignment set for this E-graph contains just one assignment 
a: a(/i) = 0,a(/2) = l,a(a;) = a{y) = 2. In this case however, a 
This is because the equality edge we added, indirectly caused the disequality edge 
between fi and /2 to be disregarded. We will therefore add a rule to augment 
our E-graph with more edges in this case: 

Tentative Rule 1 . If there is a disequality edge between fi and fj, add a dis- 
equality edge between their corresponding arguments. 

But this rule is not enough. We consider the following formula: 

PA = {Fi{x) = z) A {F 2 {y) ^ z) A ((x = y) V True) 

G{simp{p 4 )) appears in Figure 1 asGi. In this case, the above Tentative Rule 1 
does not apply, and we are left with the same problem, since a possible adequate 
assignment set for this E-graph contains just one assignment a: a(/i) = a{z) = 
0, a(/ 2 ) = 1, Oi{x) = a{y) = 2, and a does not satisfy T^^ (pa). This is because a 
disequality edge between fi and /2 is only implied in this E-graph, and so we wish 
to change Tentative Rule 1 so that it identifies implied disequality requirements. 

We write u xg v if there exists a simple path between u and v iriG consisting 
of equality edges except for exactly one disequality edge. This is what we mean 
by “implied” disequality edge. What this means is that an assignment where u 
and V differ may be needed to satisfy the formula. We alter Tentative Rule 1: 

^ Of course, any decent procedure will remove the right clause, but this True can be 
hidden as a more complex valid formula. 
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Rule 1 . If for fi and fj, fi xg fj then add a disequality edge between their 
corresponding arguments. 

We now consider a similar UF-formula: 

if5 = {TrueV (Fi{x) = z)) A (^2(2/) ^ z) A {x = y) 

Q{simp{(f 5 )) is exactly the same as before, and Rule 1 adds the disequality edge 
(x,y) to give Q2 in Figure 1. The problem here is that a satisfying assignment 
a must satisfy a{x) = a{y), and therefore cx{Ff) = a(/i). Since we also must 
have a{Ff) yf a{z) to satisfy the formula, it implies a(/i) yf a{z). This may 
not necessarily happen in any assignment given by the range allocator for our 
E-graph. This is because in our E-graph there is no representation for the fact 
that fi may “override” / 2 . If we add an equality edge between fi and /2 it will 
solve the problem. ^3 of Figure 1 is the result of adding this edge. 

We denote by u v the case where there is an equality path between u and 
V in Q. 

Tentative Rule 2 . For fi and fj, with Xi and Xj their corresponding argu- 
ments, if Xi K.g Xj then add the equality edge (fi,fj). 

This indeed solves our problem, but is not the best we can do. We have added an 
equality edge between fi and /2 in our example, but it was not really necessary. 
We could have instead copied all edges involving /2 to fi . This is because there 
is no need for fi to be equal to /2 if their arguments are equal. All that is needed 
is that the value fi gets respects all the requirements of / 2 . Notice that this case 
is asymmetric: since fi may override / 2 , only fi is required to answer to / 2 ’s 
requirements. 

We change Tentative Rule 2 to the following rule: 

Rule 2 . For fi and fj, where i < j, with Xi and Xj their corresponding argu- 
ments, if Xi zzg Xj then do one of the following: 

1. add equality edge (fi,fj), or 

2. for every (dis)equality edge {fj,w) add a (dis)equality edge (fi,w). 

And so, in our example, instead of adding an equality edge (/i,/ 2 ), we add a 
disequality edge {fi,z) — see Q4 of Figure 1. 

The general idea of our new construction is therefore to start with 
Q{simp{(p)), and then apply Rule 1 and Rule 2 until no new edges are added. 
There are some missing details, specifically, the second option of Rule 2 needs 
to be postponed until the whole E-graph is constructed. We show the exact E- 
graph construction in the next section. Notice that this construction has a cone- 
of-influence flavor, since in simp{ip) the arguments of uninterpreted functions 
disappear, and then only edges emanating from edges already in the E-graph 
are added. 
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Fig. 1. The Iterative E-Graph Construction Process. Dashed lines represent equality 
edges, solid lines represent disequality edges. 



6 Formal Description of E-Graph Construction 

We define an A-graph (marked by 0) to be an E-graph with the addition of 
assignment edges, which are directed. For an A-graph 0 denote by flat{<&) the 
E-graph resulting from replacing every assignment edge of 0 by an equality edge. 

For function application Fi of (p, define arg{Fi) to be the variable of (ip) 
corresponding to the argument of Fi. This means that if the argument of Fi 
is a variable v, then arg(Fi) = v, and if it is a function application Gj, then 
arg{Fi) = g^. 

The E-graph construction procedure is divided to two parts: 

1. A-graph construction: Given a UF-formula p we construct an A-graph 0: 

(a) Let the vertices of 0 be the variables of (p). 

(b) Add all edges of Q{simp{p)) to 0. 

(c) For every Fi and Fj such that i < j and arg(Fi) ^fiat( 0 ) <Arg{Fj), add 
the following edges: 

i. Add assignment edge {fi, fj) to 0. 

ii. If fi ^fiat(&) fj then add disequality edge {arg{fi),arg{fj)) to 0. 

(d) Repeat step Ic until a no new edges are added. 

Example 4- For the UF-formula pi of Example 3, the algorithm constructs 
the A-graph 0 of Figure 2, while Q is the E-graph constructed by the pro- 
cedure suggested in [PRSS99]. 

2. Transforming the A-graph to an E-graph: The second step of the 
procedure is to transform the A-graph 0 to an E-graph Q. For two vertices 
u and V, we denote v Gg u, if: 

(a) for every {v,w) G EQ{Q), {u,w) G EQ{Q). 

(b) for every {v,w) G DQ{Q), {u,w) G DQ{Q). 

We proceed: 

(a) Initially, G = {¥{&), EQ{&), DQ{<3)) 

(b) While there are vertices u, v, such that {u, v) is an assignment edge of 
0, and either {u, v) ^ EQ{G) or v Eg u, choose one of the following 
options: 
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i. add edge (u,v) to EQ{Q). 

ii. A. for every (v,w) G EQ(Q) add (u, w) to EQ{Q). 

B. for every {v,w) G DQ{Q) add (u,w) to DQ{Q). 

Theorem 1. If E-graph Q is constructed by the above procedure run on UF- 
formula (p, then Q is adequate for (p. 

Note that the Part 2 of the procedure requires a choice between two options. In 
our implementation we choose greedily between the two options, choosing the 
one which minimizes the number of equality edges added to Q. 

Example 5. Q\ and Qi in Figure 2 are the two possible E-graphs resulting from 
applying this Part 2 to 0. As we can see both Q\ and Q 2 are much smaller than Q 
(the E-graph constructed by [PRSS99]). In fact, we can show that any adequate 
assignment set for Q is of size at least 16, and on the other hand, there is an 
assignment set of size 4 for Qi, and of size 2 for Q 2 - 




Fig. 2. Dashed lines represent equality edges, solid lines represent disequality edges, 
and dashed directed lines represent assignment edges. 



7 Comparison with Previous Methods 

If we examine the E-graph construction of [PRSS99], we see that it is basically 
the same as this new construction, except there is no conditioning on when 
to add new edges, instead, they are always added. In other words, remove all 
conditions of Step Ic in Part 1 of the procedure, and for every Ei and Ej add a 
disequality edge between their arguments, and an equality edge between fi and 
fj . Therefore, our E-graph will always be smaller than in [PRSS99] , resulting in 
a smaller state space. 
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In [BGV99], it is proved that for a UF-formula Lp in positive equality, every 
variable of (ip) can be instantiated to a single constant. A UF-formula ip is 
said to be of positive equality if no equality terms of (p are in the input cone of 
a function application, and all equality terms of ip appear under an odd number 
of negations — they are in negative polarity^. It is easy to see that our A-graph 
construction for such formulas will result in an A-graph with no equality edges. 
Then, if we use our greedy heuristic for the Part 2 of the procedure, it will result 
in an E-graph consisting of only disequality edges. An adequate range for such an 
E-graph contains just one assignment, assigning each variable a distinct constant. 
We therefore achieve this optimal result for the positive equality segment of the 
formula, while improving on the other variables (since they give a range of 1 ... i 
to the z-th variable, resulting in a state space of n!, which we will always improve 
upon — see [PRSS99]). 

8 Experimental Results and Conclusions 

We implemented our new graph construction procedure, and then used the range 
allocator of [PRSS99] to construct a new procedure for checking satisfiability of 
UF-formulas. We compared our decision procedure with that of [PRSS99] on 
many example formulas that were generated by a tool for compiler translation 
validation [PSS98]. The results appear in Table 1, where the prefix New denotes 
the results of this paper, and the prefix Old the results of [PRSS99] . space denotes 
the resulting assignment set size. Since in all cases encountered the verification 
procedure either proved that the formula valid in less than 1 sec, or ran out of 
memory, we do not write the exact running time. Instead we write if the run 
completed, and x if it didn’t. Num. vars denotes the number of variables in the 
example. There were many examples were both methods resulted in a very small 
state space (and running time), and therefore we mention only those were there 
was a significant difference between the two methods. 



Table 1. New vs. Old E-graph Construction. 



Example New-finished Old-finished New-space Old-space Num. vars 



15 


a/ 


X 


121 


121 


13 


22 


V 


X 


2 


9.8 • 10"“^ 


114 


25 


V 


X 


1 


5.9 • 10^^ 


114 


27 


V 


a/ 


2 


11520 


26 


43 


V 


X 


4 


3.4- 10^°* 


160 


44 


V 


a/ 


4 


2.5 • 10“ 


46 


46 


V 


V 


2 


1.6 • 10^^ 


67 


47 


V 


V 


1 


4.9 • 10® 


52 



^ The confusion between ’positive’ equality and ’negative’ polarity is due to the fact 
that in [BGV99], where this term was introduced, the analysis referred to validity 
checking, rather than satisfiability as in this paper. 
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As can be seen from the table, the new graph construction has an extreme 
effect on the state space size. Indeed, by using the new graph construction we 
were able to verify formulas which we could not with the previous method. 

To conclude, we showed that the combination of Bryant et al. reduction 
method, Pnueli et al. range allocation, and a more careful analysis of the formula 
structure are very effective for verifying equality formulas with uninterpreted 
functions. 
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Abstract. We present a model checking algorithm for WCTL (and full CTL) 
which uses an iterative abstraction refinement strategy. 

It terminates at least for all transition systems M that have a finite simulation or 
bisimulation quotient. In contrast to other abstraction refinement algorithms, we 
always work with abstract models whose sizes depend only on the length of the 
formula dJ (but not on the size of the system, which might be infinite). 



1 Introduction 

The state explosion problem is still the major problem for applying model checking 
to systems of industrial size. Several techniques have been suggested to overcome this 
limitation of model checking; including symbolic methods with BDDs I b.Bdl or SAT- 
solvers [0, partial order reduction lllhil'ZrKfl . compositional reasoning [EEO and ab- 
straction r lLLF6IjF/l>yTr^TT9l . See [IS] for an overview. 

In this paper, we concentrate on abstraction in a temporal logical setting. Let !M be 
the concrete model that we want to verify against a temporal logical formula O. The 
rough idea of the (exact) abstraction approach is to replace iTf by a much smaller ab- 
stract model with the strong preservation property stating that ^ O iff iTf |= O. 
The subscript a stands for an abstraction function that describes the relation between 
concrete and abstract states. In the simplest case, a is just a function from the concrete 
state space S to the abstract state space, (the state space of the abstract model j^). For 
instance, dealing with the abstraction function a that assigns to each concrete state s 
its (hi-) simulation equivalence class, we get the (bi-)simulation quotient system ‘Mms or 
Msim, for which strong preservation holds if O is a CTL* resp. \/CTL* formula [EI3. 



Algorithm 1 Schema of the Abstraction Refinement Approach, 
construct an initial abstract model i := 0; 

REPEAT 

ModeLCheck( Jl; , O) ; 

IF ^ O THEN J^+i := Refinement( j^-,0) FI; 

( := ;+ 1 ; 

UNTIL ^ O or = j^_i; 

IF 1 O THEN return “yes” ELSE return “no” FI. 



G. Berry, H. Comon, and A. Finkel (Eds.): CAV 2001, LNCS 2102, pp. 155-E^2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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If O is fixed these are unnecessarily large. In general, conservative abstractions that 
rely on the weak preservation property, stating that \= O implies lAf |= O, yield 
much smaller abstract models. Such models can be used in the abstraction refinement 
schema shown in Algorithm [^(e.g. ||lU|l8|Zt)|Z5|lZ| ). Here, Model_Check(. . .) denotes 
any standard model checking algorithm and Refinement(. . .) an operator that adds fur- 
ther information about the original system iM to to obtain an abstract slightly more 
“concrete” model. A necessary property that ensures partial correctness of the above ab- 
straction refinement technique is the strong preservation property for the final abstract 
model which might be obtained when no further rehnement steps are possible. 

The major difficulty is the design of a refinement procedure which on one hand should 
add enough information to the abstract model such that the “chances” to prove or dis- 
prove the property O in the next iteration increase in a reasonable measure while on 
the other hand the resulting new abstract model should be reasonable small. The 
first goal can be achieved by specification-dependent refinement steps such as coun- 
terexample guided strategies IllUIZbIiZi where the current abstract model is refined 
according to an error trace that the model checker has returned for or by strategies, 
that work with under- and/or overapproximations for the satisfaction relation of the 

concrete model, e.g. Ill8t!8|jl|jbl . To keep the abstract models reasonable small two 
general approaches can be distinguished. One approach focusses on small symbolic 
BDD representations of the abstract models (e.g. ||iUJZ5|jt|jb|15| ). while other ap- 
proaches attempt to minimize the number of abstract states (e.g. IllllljiiVIltll ). While 
most of the fully automatic methods are designed for very large but finite concrete sys- 
tems, most abstraction rehnement techniques for inhnite systems are semi-automatic 
and use a theorem prover to perform the rehnement step or to provide the initial model 
lit /|Z5tj|t|jS| . An entirely automatic abstraction technique that can treat inhnite 
systems is presented in E3- 

Our Contribution: In this paper, we present an abstraction rehnement algorithm that 
works with abstract models with a. fixed state space that just depends on the specihcation 
(temporal logical formula) but not on the concrete system. In our approach, the concrete 
system M to be verihed is an ordinary (very large or inhnite) transition system. We use 
the general abstraction framework suggested in 1E3 and deal with abstract models 
with two transition relations. Although our ideas work for full CTL, we provide the ex- 
planations for the sublogic VCTL for which the formalisms are simpler. 

The rough idea of our algorithm is the use of abstract models that are approximations 

of jTj), the abstract model that results from the original model iM when we collapse all 
states that satisfy the same subformulas of O. (Here, O is the formula we want to check 
for iM.) Of course, the computation of the abstract model jTj) would be at least as hard 
as model checking the original system iM. Anyway, we can use the state space of jTj) 
(which consists of sets of subformulas of O or their negations) for the abstract models 
j^ . Thus, the size of any of the abstract models is at most exponential in the length 
|<I>| of the formula; independent on the size of the concrete system which might be in- 
finite. Any abstract model is equipped with an abstraction function a,- which stands 
for partial knowledge about the satisfaction relation in the concrete system 9v[. 
The abstraction function a,- maps any concrete state s to the abstract state a = a,(^) in 
consisting of those subformulas T' of O where we already know that s for all 
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^ S O and all those negated subformulas where 5 ^ is already shown. Refining 

Sij means adding more information about the concrete satisfaction relation \=^', result- 
ing in an abstract model where a,+i (i') is a superset of a,- (5). Partial correctness of 
our algorithm is guaranteed for (concrete) transition systems of arbitrary size. Our al- 
gorithm terminates at least if the concrete system has a finite simulation or bisimulation 
quotient. The only theoretical requirement for an entirely automatic implementation is 
the effectiveness of the dual predecessor predicate in the concrete system. 

Related Work: Our methodology borrows ideas from many other abstraction refine- 
ment algorithms. We work with under- and overapproximations for the concrete sat- 
isfaction relation \=^ that we derive from the abstraction function a,. Although such 
“sandwich” techniques are used by several other authors, e.g. 1 512113 111 , we are not 
aware any other method that is designed for general (possibly infinite) transition sys- 
tems and works with abstract models of a fixed size. Our methodology is also close to 
the framework of where an abstraction refinement algorithm for VCTL and finite 
concrete transition systems is presented. ni only needs underapproximations for the 
concrete satisfaction relation. The major difference to our algorithm is the treatment of 
formulas with a least or greatest fixed point semantics (such as VO^ and VOT*) in the 
refinement stepOJ Abstraction techniques with under- and/or overapproximations that 
focus on abstract models with small BDD representations are presented in I31l3(11.5ll . 
We also use ideas of stable partitioning algorithms for computing the quotient space 
with respect to simulation or bisimulation like equivalences 1371 71321241 Sll . However, 
instead of splitting blocks (sets of concrete states that are identified in the current ab- 
stract model) into new subblocks (and thus, creating new abstract states), our approach 
refines the abstract model by moving subblocks from one abstract state to another ab- 
stract state (which presents more knowledge about the satisfaction relation \=^). 

The method is also loosely related to tableau based methods as presented in 13(1391] . 
Outline: In SectionQ, we explain our notation concerning transition systems, CTL and 
briefly recall the basic results on abstract interpretations which our algorithm relies on. 
The type of abstract models used in our algorithm is introduced in Section^ Section|3 
presents our abstraction rehnement algorithm for VCTL and sketches the ideas to handle 
full CTL. Section^ concludes the paper. 



2 Preliminaries 



We expect some background knowledge on transition systems, temporal logic, model 
checking, abstraction and only explain the notations used throughout this paper. For 
further details see, e.g. d. 



Transition Systems : A transition system is a tuple M = (S,^,I,AP,L) where S is 
a set of states, ICS the set of initial states, AP a finite set of atomic propositions and 
L: S ^ 2^^ a labeling function which assigns to any state s £ S the set L{s) of atomic 

^ Our refinement operator works with a “one-step-lookahead” while |1^ tre ats p aths that might have length > 1. In fact, 
this explains why underapproximations are sufficient in the framework of [1^ while we need both under- and overap- 
proximations to mimic the standard least or greatest fixed point computation. The fact that we just refine according to 
single transitions (paths of length 1) makes it possible to treat infinite systems. 
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propositions that hold in S x S denotes the transition relation. Let Post{s) = 

{s' G S : s ^ y}, Pre{B) = {s G S : Post{s) C B}. A path in a transition system is a 
maximal sequence 7 t = jq ^ ‘Ji ^ • of states such that S{ G Post{sj-i), i= 1,2, .. . . 

Here, maximality means that either 7 t is infinite or ends in a terminal state (i.e., a state 
without successors). 



Computation Tree Logic (CTL) : CTL (state) formulas in positive normal form are 
built from the following grammar. 



O ::= true 



Oi A O2 



Oi V O2 



Vtp I 3tp 9 ::= XO 



Oi UO2 



Oi OO2 



with a G AP. Here, X and U are the standard temporal modalities “Next step” and “Un- 
til” while 0 denotes “weak until” (also often called “unless”).0 Operators for mod- 
elling “eventually” or “always” are derived as usual, e.g. V0<5 = Vlri/eDO and VDO = 
VOU/aZi’e. The universal fragment of CTL (where the application of “3” is not allowed) 
is denoted by \/CTL. Similarly, 3C7X denotes the existential fragment of CTL. The sat- 
isfaction relation \=^ for CTL formulas and transition systems lAf is defined in the stan- 
dard way. The satisfaction set for O in lAf is given by Satg^{{0) = {i G S : s \=g^( O}. 
We write iLf ^ O iff O holds for any initial state, i.e., iff I C Sat {(b). Although nega- 
tion is only allowed on the level of atomic propositions, we shall use expressions of the 
type (with the intended meaning s iff s T*). 



Abstract Interpretations ; Let !M = {S,^, I,AP, L) be a transition system that models 
the “concrete system” (that we want to verify). Let Sa be an arbitrary set of “abstract 
states”. In what follows, we use the Latin letter s for concrete states (i.e., states sGS) and 
the greek letter o for abstract states (i.e., states o G Sa). An abstraction function for 9v[ 
(with range Sa) is a function (X'.S^Sa such that a(j) = <x{s') implies L{s) = L{f). The 
induced concretization function y : 5 a — > 2 '^ is just the inverse image function y = 

(that is, y(o) = {s G S : a{s) = a}). We use the results of 1 11^1 and associate with a 
two transition relations —> 0 ; (which we shall use to get underapproximations for the 
satisfaction sets Satg^f)) and (yielding overapproximations). They are given by 

o o' iff 3 5 G y(a) 3 / g y(o') s.t. s^ s' 
a --^01 o' iff Vj g y(o) 3 s' G y(o') s.t. ^ ^ s'. 

For any (abstract) path Oq Oi . . . and concrete state so G y(oo), there is a (con- 
crete) path 50 — 1 ^ • in such that a( j, ) = a,, / = 0, 1 , . . . while the corresponding 

statement for may be wrong. Vice versa, any (concrete) path 5 q ^ ‘Ji ^ • in 

can be lifted to a path Gq Oi • • • where a,- = (x{si). 

Let U = {SA,^afa,AP,La) and O = {SA,'^afa,AP,La) be the transition system 
with state space Sa where the set of abstract initial states is = a( 7 ) = {00(5) : j G /}. 
The abstract labeling function La '. A ^ 2^^ is given by La(a) = a(5) for some/all 
concrete states s G y(o). Then, we have weak preservation of the following type. 

2 

Any ordinary CTL formula (where also negation is allowed in the state formulas) can be transformed into positive normal 
form. Note that the dual to the until operator (often called the “release operator”) can be obtained by ->(-><1)1 U-'<I> 2 ) = 
(^<bi a<IJ2)U((I>i AO 2 ). 
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Lemma 1. (cf 111 312711 91 ) Let s be a concrete state. 

( 1 ) If'V is a yCTL formula and a(5') |=<u ^ then s \=g^ 

(2) If'V is a 3CTL formula and a(5') \=o ^ then s 

3 Abstract O-Models 

Throughout this paper, we assume a fixed concrete transition system Ivi = {S,^,I,AP, 
L) without terminal states and a yCTL formula O. When we refer to a subformula then 
we mean a formula which is not a constant true or false. sub{0) denotes the set of all 
subformulas of O. We may assume that, AP C sub{0). We refer to any subformula of 
O of the form T* = Vtp as a Vsubformula of O. 



The Abstract State Space S(t, : Let c/(<I>) denote the set of all subformulas T* of O and 
their negation (where we identify ^^a and a). I.e., c/(<I>) = subifSf) U : T* G 
sub(0)}. We define the set So C as follows. So denotes the set of o C c/(<I>) such 
that the following conditions (i) and (ii) hold, (i) for any atomic proposition a G AP 
and a G So, either a G <5 or ^a G < 5 . (ii) asserts the consistency of o with respect to 
propositional logic and local consistency with respect to “until” and “weak until”. We 
just mention the axioms for “until”.0 

1. If T ^2 e O and VT'i UT '2 € sub{Q>) then VT'i UT '2 € O. 

2. If T ^2 ^ Ct and VT'i UT '2 G O then T'l S a (provided that T'l ^ {true, false}). 

3. If -T'l , ^T^2 e O and VT'i UT'2 € sub{^) then ^VT'i UT'2 € O. 

4. If ^VT'i UT'2 G a then ^T^2 G o. 



The abstract models ZLj> and Oo yield precise abstractions. Let ao : S ^ So be given 
by ao(i) = {‘L G sub(0) : s \=y^ T'}U : T* G sub(0), 5 T'}. It is well-known 

1^ that for the abstract model that we get with the abstraction function ao we just can 
establish the weak preservation property but do not have strong preservation. However, 
when we add a new atomic proposition «>{/ for any Vsubformula T' of O then we get an 
abstract model for which a slight variant of the strong preservation property holds. Let 

AP(^ = APU {a>p : T' is a Vsubformula of O}. 

We put = a if T' = fl is an atomic proposition. Let L<u, Lq : S* — > AFtp be given by 

L>u{g) = {flip G AP(i> : T* G o}, Lq(g) = {a>j< G AP<^ : o}. 

When dealing with underapproximations, we use the labeling function L<n while Lq 
will serve for the overapproximations. We define Zl^ = {S^,^o 0 fa(s,,AP(t,,L‘u) and 
Od> = (So,~^a^,/a^,AF<i>,Lo). 

3 

For “weak until” we have essentially the same axioms as for “until”. The propositional logical axioms are obvious; 
e.g. we require that “4^ G o implies ^ a” and the symmetric axiom “->4^ G a implies 4^ ^ a”. One of the axioms for 

conjunction is “4^i A 4^2 ^ CJ iff 4^i € O and 4^2 ^ CT.” Note that we do not require maximality ; i.e., 4^, ->4^ ^ a is possible 
if 4^ ^ AP. 
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$ = VO^f, = VDa 




""a: 









Intuitively, the labelings Ln and Lq with the auxiliary atomic propositions axfi shall 
encode the information about the satisfaction set that might got lost with the 

abstract transition relations — >oi and 

Example: For the concrete system !M shown in the picture above and the formula O = 
VOVDfl, !M ^ 0 while Oo |= O. In our examples we depict concrete states by circles, 
abstract states by ellipses. Their names are written below while the corresponding labels 
are written inside the states. □ 

The Formulas T* and ‘F: For each subformula T* of O we define new \/CTL formulas 
T* and T* by structural induction. If T* is true, false or a literal then T' = T* = T*. If 
T' = T'l V then 'F = T'l V and T' = T'l V T' 2 - Conjunction is treated in a similar 
way. The transformations for “next step”, “until” and “weak until” make use of the new 
atomic propositions. For T* = VXT'o we put ‘F = (VXT'o) V aqi and T' = (VX'Fq) A 
If »F VT'i UT '2 then we put T V a>{-) and 'F = (VF^ UF^) A a^- Similarly, 

we treat weak until. It is easy to see that for any concrete state s and F S cZ(<I>): 

iff a<t(.) hcb ‘P iff 

In the example above, we get O = (VOF) A where F = (VDa) A and the desired 
property 0<5 ^ O. 



Abstract O-Models : Zlo and O® contain all information that we need to model check 
the original system against the formula O. In our abstraction refinement algorithm 
we make use of abstract models which can be viewed as approximations of Zl^ and O®. 

Definition 1. An abstract O-model /or tM is a tuple Si = (oc,Y, U, O) consisting of an 
abstraction function a : 5 — > S'® with a{s) C a(j)(5) for any concrete state s G S, the 
concretization function y=OT^ : ^ S and the two transition systems U — {S^,^a 

,la,AP(^,Lqf) and O = {S^,-^afaiAPo,Lo) where /«, are as in Section^ □ 

Intuitively, the sets a(s) consist of all subformulas F of <t> where s F has already 
been verified and all formulas ^F where s F has already been shown. However, 
there might be formulas F S sub{0) such that neither F G a(i) nor ^F G a(i). For 
such formulas F, we do not yet know whether s F. 

Let SI — (oc,Yj Zl, O) be an abstract O-model. We associate with SL two satisfaction 
relations. \=n denotes the standard satisfaction relation for CTL and the transition sys- 
tem Zl. As we assume that the concrete transition system Zv[ has no terminal states, 
all paths in M and Zl are infinite. However, the abstract transition system O might 
have terminal states. For O, we slightly depart from the standard semantics of CTL. 
For the finite paths in O, the satisfaction relation \=q treats weak until and until in the 
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same way. Let 7i = Oq -^a -^a ■ ■ ■ '^a be a finite path. Then, n ^iUT '2 iff 

7t 'Ll UT '2 iff either Oo,Oi, . . . ,(T„ Ho 'Pi or there is some k G {0, with 

ao,Oi, . . . Ho 'Pi Oii Ho 'P2'0The reason why we need this modification is 
that we “reverse” the result established by EHI stating that a(j) Ho 'P implies s Hsh 'P 
for any 3CTL formula T* (compare Lemma[|] part (2), and Lemma0 part (b)) For in- 
finite paths and any type of path formulas, we deal with the usual CTL semantics in O. 
Also for the next step and weak until operator and finite paths in O, we work with the 
usual semantics. (Thus, a Ho VXT' holds for all terminal states a in O.) 

Lemma 2. For any concrete state s G S and T* G sub(0): 

(a) If<x{s) \=fi T' then s Hrvf 'P- 

(b) Ifa{s) Ho 'P then s Hsvf 

(c) If'V G a{s) then a(j) Ht/ 'P- 

(d) If^'V G a(s) then a{s) Ho 'P- 

Any abstract O-model A = (a,y, U, O) induces under- and overapproximations for the 
sets Saty^i^) = {s G 5 : 5 HsW 'P}’ 'P ^ sub{<^). 

Definition 2. Let Sat\{'V) = {sGS\->Vi a(i)}, = {^ G 5 : T' G a(H}- □ 

Lemma 3. Sat^{^) CSaty^{^) CSat^{^) for any'V G sub{<^).\2 

Lemma 3 follows by a(^) C a<j)(^). Clearly, given a or y, the abstract O- model Fi 
is uniquely determined. Vice versa, given over- and underapproximations SaH(T') 
and for Saty^ifV) there exists a unique abstract O-model FI with 5aH(T') = 

Sat+^{'V) and 5ar (T') = 

Definition 3. H ® tff^ G <3 for all abstract initial states O and A \= iff there is 
an abstract initial state O with G ofl □ 

Clearly, iff I C 5at^(<I>) iff O G a(i) for any concrete initial state s. Similarly, 

H ® iff there is a concrete initial state s such that G a(s). By Lemma|7[c.d): 

Lemma 4. If A H ® then iTf H H then iTf H ^ 

Blocks and the Partition ri;:i : We refer to the sets B — y(a), a G So, as blocks in 
with respect to A. Clearly, the collection TI;:j of all blocks in Myi is a partition 
of the concrete state space S. It should be noticed that for any block B G ri;:i either 
B C Satf^{'¥) or B n Satf^fV) = 0. The same holds for Sat^(f¥). 

4 An Abstraction Refinement Model Checking Algorithm 

Our algorithm (sketched in Algorithm 0) uses the abstraction refinement schema of 
Algorithmic We start with an abstract O-model FIq and will successively refine the 
model A until H ® or H The output of our algorithm (sketched in Algorithm 
0 is clear from Lemmaj^ 

^ Alternatively, when we interpret a path formula 0 = Vtp over 0 then we may use the standard semantics for CTL but 
switch from V4^iU4^2 to the formula V4^iU(4^2 V A VX/a/.?e)). 

^ Consider the model ^ induced by the abstraction function a(3') — {4^ : s G (4^)}U {->4^ : s ^ 

^ The reader should notice that ^ is not the same as -lO. [H O and [H -lO is possible. 



162 



Alexander Asteroth, Christel Baler, Ulrich ABmann 



The initial abstract O-model is the abstract O-model = Map that we get with the 
abstraction functions ao = OCap :S^ S(p where aAp(s) = U{^a:a€ AF\L(s)j] . 

Here and in the following, [a] denotes the smallest element of 5(j) containing oQ 
The use of Uap reflects the knowledge that all concrete states labeled with an atomic 
proposition a satisfy a while holds for i if a is an atomic proposition not in L(s). The 
status of more complex subformulas in O (whose truth value cannot be derived from 
the axioms for So) is still open. For the concrete system and formula O depicted in 
the previous figure (Section|3), the initial abstract model Mq is as shown on below. 




Algorithm 2 Main Procedure of the Abstraction Refinement Algorithm. 
Mo := Map; i ■= 0; 

REPEAT 

M:= ModeLCheck(j^/,<I)); 

WMi^^ and Mi ^ -<t> THEN 

FOR ALL Vsubformulas T' of O DO 
IF Sflft(T') ^ Sarj^') THEN 
“j^:iReflne(Jl>); 

ELSE 

replace ‘P by the atomic proposition 

FI 

OP 

FI 

i != z -f 1 ; Mi M; 

UNTIL JT- h O or Mi [= -O; 

IF 1= O THEN return “yes” ELSE return “no” FI. 



Model Checking the Abstract O-Model: Let Mi — (a, y, Zl, O) be the current abstract 
O-model. In any iteration, we apply a standard model checker that successively treats 
any Vsubformulas ‘P of O for both transition systems Zl and O. 

Let T' be a Vsubformula of O. First, we apply a standard model checking routine for 
Zl and the formula T* to calculate the satisfaction set = {o G 5(j) : O \=ii T'}. 

We derive the set NewSat{'V) = {a G 5(j) : T' ^ O, O |=‘u T'} of all abstract states a 
where T' now holds while T' did not hold in the previous iteration. By Lemma 0 part 
(a), we know that T' holds for all concrete states s G U{y(^) ■ ^ V NewSat{^’)}. Thus, 
we can improve the underapproximation Sat^{^’) of Satrn^{^’) by adding all blocks 
Y(o) where a G NewSat{'V) to Sat^{'V). 

^ If a C meets all axioms concerning propositional consistencies then o can be extended (according to the axioms 
that we require for 5^) to a least superset [a] G that contains o. E.g. for 0 = 
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Second, we call a standard model checker for O and ^ to obtain the s,eiNewSat{-^'V) 
= {a G So : ^ ct, O of all abstract states o where ^ is not satisfied while 

^ did hold for a in the previous iteration. Lemma El part (b), yields that none of the 
concrete states s G U{Y(t5) ■ € NewSat{-^'V)} satisfies Hence, we may remove the 

blocks y(o) where a G NewSat{-^'V) from (i.e., we improve the overapproxi- 

mation). 



Algorithm 3 The Model-Checking-Routine ModeLCheck(JY,<I>). 

Let Y be the concretization function of A. 

FOR ALL Vsubformulas T* of O DO 

calculate the set NewSati^) = {ogSo and ‘F ^ o}; 

FOR ALL o G NewSati'V) DO y([ OU {‘F} ]) := y(o) U y([ oU {T*} ]); 
Y(o) := 0 OP ; 

calculate the set NewSat{-^'V) = { o G S<j) O 'P and ^ o}; 

FOR ALL g G NewSati^'V) DO y([ oUj-T'} ]) :=Y(o)UY(r aU{-T'} ]); 
Y(o) := 0 OP ; 

OP 

return the abstract O-model induced by Y 



Algorithm0combines the two model checking fragments and returns a new abstract 
O-model JY' = ModeLCheck(j^-,<I>) with the abstraction function a! where o!{s) arises 
from a(i) by adding T* if a(j) G NewSati^) and adding if a(i) G NewSat{^ ‘F)| 



Example : For the initial model in the running example, NewSat{^’) = NewSat{0) = 
NewSat{^^’) — 0 while NewSat{^0) consists of the black abstract state o = {^a, ^‘F}. 
Therefore, we move y{o) to o' = and obtain a model A with the follow- 

ing components U and O. 




The refinement operator takes as input the abstract O-model A that the model checker 
returns and replaces A by another abstract O-model A+i where again the under- and 
overapproximations are improved. A+i is obtained by a sequence of refinement steps 
that successively treat any of the Vsubformulas of O. As usual, the subformulas should 
be considered in an order consistent with the subformula relation. Let us assume that 
A is the current abstract O-model to be refined according to a Vsubformula 'F of O. If 
the over- and underapproximations for T' agree in A, i.e., if then 



Any movement of blocks might change (improve) the current abstract O-model Thus, any FOR-loop of 
ModeLCheck(j^,<l>) is started with a model that might be even better than the original model jT. 
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we may conclude that Sat 1 ^{^’) = = Sat^{^’). As the precise satisfaction set 

for ^ is known there is no need for further treatment of From this point on, ^ (and 
its subformulas) can be ignored. Thus, we just replace T' by the atomic proposition avp. 
E.g. if O = VX(V 0 a A b) and T' = VOa then we replace O by VX(a>j/ A b). Otherwise, 
i.e., if Sat^i^) is a proper subset of we calculate A' = Refine(jl,T') by: 

CASE T' IS VXT'o THEN return Refine Jorall_Next(JF,T') ; 

VT'i ^^^2 THEN return Refine.ForalLUntil(JF,T'); 

VT'i UT'2 THEN return Refine.ForalLWeakUntil(JF,T'); 

ENDCASE 

First, we briefly sketch the next step operator. Fet T' = VAT'q. Clearly, all concrete 
states s where Post{s) C Sat^,{^o) satisfy T'. Similarly, only those concrete states s 
where Post{s) C SatJ(T'o) are candidates to fulfill T'. Thus, we may replace PL by the 
abstract O-model PL' with 

Sat+,{^') = P^e{Sat+{^>o)), Saq,{^>) = P^e {Sat^{'¥o)) 

while the over- and underapproximations for (where 'V' ^ T') do not change. 

This change of PL corresponds to a splitting of the 
blocks B e ri;;i into the subblocks BC\P and B\P 
where P — Pre{. . .). The splitting is performed twice: 
first for P = Pre{Sat^{^’o)) which yields an “inter- 
mediate” abstract O-model PL"', second we split the blocks in PL" with the set P = 
Pre{Sat1^{^’o)) In our algorithm the splitting operation does not create new abstract 
states. Fet B = y(o) where T', ^ o and P — Pre{Sat^{'¥)). We realize the split- 

ting of B by moving the subblock R n P from the abstract state o to the abstract state 
[oU {^}]. Similarly, we treat the splitting according to the overapproximations. 





The procedure for the handling of until and weak until is based on similar ideas. For 
T' = V'Ti UT'2 we switch from PL to the abstract O-model PL' where 

Saq,{'¥) = Sat ^{'¥2) U n^e . 

Then, we check whether the least fixed point computation of Sat^{^’) via the under- 
approximations is finished. For this, we just need the information whether PL' = PL, 
i.e., whether at least one of the blocks has been split into proper subblocks (i.e., y 
changed). If so and if T'l and T^2 are propositional formulas (for which the precise 
satisfaction sets are already computed) then we may conclude that Sat^{^’) agrees with 
Saty^y{^’). In this case, we switch from JF to PL" where Sat^„{^’) = Sat^{^’) and replace 
T' by the atomic proposition avp. If the computation of Saty^y{^’) is not yet finished then 
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we improve the upper bound. These ideas are presented in Algorithm ^ The treatment 
of weak until in the refinement step is almost the same as for until; the only difference 
being - as we have to calculate a greatest fixed point via overapproximations - that the 
roles of under- and overapproximations have to be exchanged. 

Example : Let us revisit the running example. Let A = (a,y, U,0) be the current 
abstract O-model the model checker has returned in the first iteration (see the pic- 
ture above). Refinement starts with 'T = VDa. We get Pre{Sat^{'V)) = Pre{y{{a})) = 
Y({a}) \ {■So}- Thus, the grey concrete initial state so is moved to All other 

refinement steps leave the model unchanged. Refine(j^,<I>) returns the model with com- 
ponents U\ , Oi as shown below. 




In the following model checking phase, NewSat{^’) = NewSat{0) = NewSat{^^’) = 
0. NewSat{^0) consists of the grey abstract state a = Therefore, we move 

Y(cs) = {.So} to the abstract state o' = {aj^T^j^O}. We obtain an abstract O-model 
where the abstract interpretation of the concrete initial state so is 02 ( 50 ) = 0 '. As 
o' contains the condition j ^2 H the repeat-loop of Algorithm 2 holds (see 

Def.0. Hence, Algorithm 2 terminates with the correct answer “no”. □ 



Remark : There is no need for an explicit treatment of the boolean connectives V and A 
in the model checking or refinement step. For instance, if T' = VT '2 is a subformula 
of O then improving the approximations for the sets Saty^^{^’i) automatically yields an 
improvement for the underapproximation for Satg^{^’). “Moving” a block B from an 
abstract state a to the abstract state o' = [o U {T'l }] has the side effect that B is added 
to both and ■Sat^(T'). This is due to the axioms, we require for the elements 

in 5(1). The corresponding observation holds for the overapproximations Sat^{-). □ 
Remark: The atomic propositions play a crucial role in both the model checking and 

the refinement procedure. The labelings Ly and Lq cover the information that might 
got lost due to the transition relations and In the refinement phase, they are 
necessary to detect when the computation of a least or greatest fixed point is finished. 
□ 



Theorem 1. [Partial Correctness] If Algorithm^terminates with the answer “yes" 
then f/lf 1= O. If Algorithm^terminates with the answer “no” then f/lf ^ O. □ 



Because of the similarities with stable partitioning algorithms for calculating the (bi-) 
simulation equivalence classes 1571715212^ it is not surprising that our algorithm ter- 
minates provided that the (bi-)simulation quotient space of M is finite. 



Theorem 2. [Termination] If the concrete model iM has a finite simulation or bisim- 
ulation quotient then Algorithm^terminates. 
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Algorithm 4 Refine_ForalLUntil(j^,^) where ^ = V'Pi U^ 2 - 
Let Y be the concretization function of 

P := changed:= false; (* improve the underapproximation for Salg^^{'¥) *) 

TORALL Ct e 5* where ‘L ^ a, 'Ll e o and y(ct) n P ^ 0 DO 

y(|'oU{‘l'} ]) := (y(o)nP) U y(|" OU{‘l'} ]); y(o) :=y(o)\P; changed := true; 

OP ; 

IF not changed and 'Ll , 'F 2 are propositional formulas THEN 

(* the least fixed point computation is finished; put Sat^i^) := 5af^(‘F) *) 
replace 'F hy the atomic proposition dip; 

FOR ALL o e S<s with y ^ g and ^ g DO 
y([ gU{^'P} ]) :=y([ gU{^'F} ]) U y(g); y(g) :=0; 

OP 

ELSE _ 

P := Pre{Sat^{'V))\ (* improve the overapproximation for Satc^i'P) *) 

FOR ALL g e S4, where ^ g, ^ g, ^^2 £ g and y(g) \P 7 ^ 0JX) 
y([ gu{^>F} ]) :=y([ guj^'F} ]) u (y(g)\P); y(g) :=y(g)nP 

OP 

H 

Return the abstract <t>-model with concretization function y. 



Full CTL: Our algorithm can be extended to treat full CTL. The major difference is the 
handling of existential quantification which requires the use of the transition relation 
when calculating the underapproximations while for the overapproximations we 
use the transition relation ^a- Given an abstract O-model A — (oc,y, U, O), we work 
(as before) with two satisfaction relations and \=o- E.g. O \=<u 3tp iff there exists 
a path 7 t in O (i.e., a path built from transitions w.r.t. that starts in a and 7 t 9 . 
In the refinement phase, we use the predecessor predicate Pre{-) rather than Pre{-). For 
instance, to improve the underapproximation for a subformula ‘F = BOT'o we split any 
block B = y(a) (where T* ^ o) into Bnf’re(5at^(T')) and B\Bre(Satj^('F)). Again, the 
partial correctness relies on the results of EH- Termination can be guaranteed for any 
concrete system with a finite bisimulation quotient. 

5 Concluding Remarks 

We have presented a general abstraction refinement algorithm for model checking large 
or infinite transition systems against VCTL (or CTL) formulas. Partial correctness can 
be established for any concrete transition system !M which (if it is finite) could be 
represented by a BDD or might be a program with variables of an infinite type. Termi- 
nation can be guaranteed for all concrete systems with a finite bisimulation quotient. 
For yCTL, our algorithm terminates also if only the simulation quotient is finite. 

Clearly, the feasabllity of our algorithm crucially depends on the representation of 
the concrete system for which we have to extract the Pre-information. In principle, 
our methodology can be combined with several fully or semi-automatic techniques that 
provide an abstract model. For large but finite concrete systems, we suggest a symbolic 
representation of the transition relation in !M and the blocks in with BDDs. We 
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just started to implement our method with a BDD representation for the concrete model 
M but, unfortunately, cannot yet report on experimental results. It might be interesting 
to see whether (and how) the abstraction techniques for BDDs (e.g. 11 I I can be 
combined with our algorithm. To reason about infinite systems, the fully automatic 
approach of M seems to fit nicely in our framework as it works with a Pre-operator 
similar to the one that we use. 

One of the further directions we intend to investigate is the study of real time sys- 
tems or other types of transition systems that are known to have finite (bi-)simulation 
quotients . In principle, our technique should be applicable to establish qualitative 
properties of timed automata (expressed in CTL). It would be interesting to see whether 
our method can be modified to handle quantitative properties (e.g. specified in TCTL). 



References 

1. P. Abdulla, A. Annichini, S. Bensalem, A. Boujjani, P. Habermehl, Y. Lakhnech. Veri- 
fication of infinite state systems by combining abstraction and reachability analysis. In 
Proc. CAV’99, LNCS 1633, 1999. 

2. R. Alur, D. Dill. A theory of timed automata. Theoretical Computer Science, 126: 183-235, 
1994. 

3. A. Aziz, T. R. Shiple, V. Singhal, A. L. Sangiovanni-Vincentelli. Formula-dependent equiv- 
alence for compositional CTL model checking. Proc. CAV’94, LNCS 818, pp. 324-337, 
1994. 

4. A. Biere, A. Cimatti, E. Clarke, M. Fujita, Y. Zhu. Symbolic model checking using SAT 
procedures instead of BDDs. In Design Automation Conference, pp.3 17-320, 1999. 

5. M. Browne, E. Clarke, O. Grumberg. Characterizing finite Rripke structures in Proposi- 
tional Temporal Logic. Theoretical Computer Science, 59(1-2):115-131, July 1988. 

6. J. Burch, E. Clarke, K. McMillan, D. Dill, L. Hwang. Symbolic model checking 10^° states 
and beyond. Information and Computation, 1992. 

7. A. Bouajjani, Jean-Claude Fernandez, N. Halbwachs. Minimal model generation. 
Proc. CAV’90, LNCS 531, pp. 197-203, 1990. 

8. D. Bustan, O. Gmmberg. Simulation based minimization. Computing Science Reports 
CS-2000-04, Computer Science Department, Technion, Haifa 32000, Israel, 2000. 

9. S. Bensalem, Y. Lakhnech, S. Owre. Computing abstractions of infinite state systems com- 
positionahy and automatically. LNCS 1427, Proc. CAV’98, pp. 319-331, 1998. 

10. F. Balarin, A. Sangiovanni-Vincentelli. An iterative approach to language containment. 
Proc. CAV’93, LNCS 697, pp. 29-40, 1993. 

11. P. Cousot, R. Cousot. Abstract interpretation a unified lattice model for static analysis of 
programs by construction or approximation of fixpoints. In Proc. POPL’77, pp. 238-252, 
1977. 

12. E. Clarke, O. Grumberg, S. Jha, Y. Lu, H. Veith. Counterexample-guided abstraction re- 
finement. LNCS 1855, Proc. CAV’OO, pp. 154-169, 2000. 

13. E. Clarke, O. Grumberg, D. Long. Model checking and abstraction. ACM Transactions on 
Programming Languages and Systems, 16(5): 15 12-1542, September 1994. 

14. E. Clarke, O. Grumberg, D. Peled. Model Checking. MIT Press, 2000. 

15. E. Clarke, S. Jha, Y. Lu, D. Wang. Abstract BDDs: a technique for using abstraction in 
model checking. In Proc. Correct Hardware Design and Verification Methods, LNCS 1703, 
pp. 172-186, 1999. 

16. D. Dams. Abstract Interpretation and Partition Refinement for Model Checking. PhD thesis, 
Technische Universiteit Einhoven, 1996. 



168 



Alexander Asteroth, Christel Baler, Ulrich ABmann 



17. J. Dingel, T. Filkom. Model checking for infinite state systems using data abstraction, 
assumption commitment style reasoning and theorem proving. In Proc. CAV’95, LNCS 
939, pp. 54-69, 1995. 

18. D. Dams, R. Gerth, O. Grumberg. Generation of reduced models for checking fragments 
of CTL. In Proc. CAV’93, LNCS 697, pp. 479^90, 1993. 

19. D. Dams, R. Gerth, O. Grumberg. Abstract interpretation of reactive systems. ACM Trans- 
actions on Programming Languages and Systems, 19(2): 25 3-291, March 1997. 

20. E. A. Emerson. Temporal and modal logic. In Jan van Leeuwen, editor. Handbook of 
Theoretical Computer Science, Volume B: Formal Models and Semantics, pp. 995-1072. 
Elsevier Science Publishers, Amsterdam, The Netherlands, 1990. 

21. O. Grumberg, D. Long. Model checking and modular verification. ACM Transactions on 
Programming Languages and Systems, 16(3):843-871, 1994. 

22. P. Godefroid. Partial order methods for the verification of concurrent systems: An approach 
to the state explosion problem (Ph.D.Thesis, University of Liege) LNCS 1032, 1996. 

23. S. Graf, H. Saidi. Construction of abstract state graphs with PVS. In Proc. CAV’97, LNCS 
1254, pp 72-83, 1997. 

24. M. Henzinger, T. Henzinger, P. Kopke. Computing simulations on finite and infinite graphs. 
In Proc. EOCS’95, pp. 453^62, IEEE Computer Society Press. 1995. 

25. P. Kelb, D. Dams, R. Gerth. Efficient symbolic model checking of the full /r-calculus using 
compositional abstractions. Computing Science Reports 95/31, Eindhoven University of 
Technology, 1995. 

26. R. Kurshan. Computer-aided Verification of Coordinating Processes: The Automata- 
Theoretic Approach. Princeton University Press, 1994. 

27. C. Loiseaux, S. Graf, J. Sifakis, A. Bouajjani, S. Bensalem. Property preserving abstractions 
for the verification of concurrent systems. Formal Methods in System Design, 6(1):1 1-44, 
January 1995. 

28. J. Lind-Nielsen, H. Andersen. Stepwise CTL model checking of State/Event systems. In 
Proc. CAV’99, LNCS 1633, pp. 316-327, 1999. 

29. D. Long. Model Checking, Abstraction and Compositional Verification. PhD thesis, 
Carnegie Mellon University, 1993. 

30. O. Lichtenstein and A. Pnueli. Checking that finite state concurrent programs satisfy their 
linear specification. In Proceedings of the Twelfth Annual ACM Symposium on Principles 
of Programming Languages, pages 97-107, New York, January 1985. ACM. 

31. W. Lee, A. Pardo, J.-Y. Jang, G. Hachtel, E. Somenzi. Tearing based automatic abstraction 
for ctl model checking. In Proc. ICCAD’96, pp. 76-81, 1996. 

32. D. Lee, M. Yannakakis. Online minimization of transition systems. In Proc. STOC’92, pp. 
264-274, 1992. ACM Press. 

33. K. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993. 

34. K. Namjoshi, R. Kurshan. Syntactic program transformation for automatic abstraction. In 
Proc. CAV’2000, LNCS 1855, pp. 435^49, 2000. 

35. D. Peled. All from one, one from all: on model checking using representatives. In 
Proc. CAV’93, LNCS 697, pp. 409-423, 1993. 

36. A. Pardo, G. Hachtel. Automatic abstraction techniques for propositional /j-calculus model 
checking. In Proc. CAV’97, LNCS 1254, pp. 12-23, 1997. 

37. R. Paige, R. Tarjan. Three partition refinement algorithms. SIAM Journal on Computing, 
16(6):973-989, 1987. 

38. H. Saidi, N. Shankar. Abstract and model check while you prove. In Proc. CAV’99, LNCS 
1633, pp 443-454, 1999. 

39. H. B. Sipma, T. E. Uribe, Z Manna. Deductive Model Checking. In Proc. CAV’96, LNCS 
1102, pp. 208-219, 1996 

40. A. Valmari. State of the art report: Stubborn sets. Petri-Net Newsletters, 46:6-14, 1994. 




Verifying Network Protocol Implementations by 
Symbolic Refinement Checking 



Rajeev Alur and Bow-Yaw Wang 

Department of Computer and Information Science 
University of Pennsylvania 
{alur ,bywang}@cis .upenn.edu 
http : //www. cis .upenn. edu/~{alur .bywang} 



Abstract. We consider the problem of establishing consistency of code 
implementing a network protocol with respect to the documentation as 
a standard RFC. The problem is formulated as a refinement checking 
between two models, the implementation extracted from code and the 
specification extracted from RFC. After simplihcations based on assume- 
guarantee reasoning, and antomatic constrnction of witness modules to 
deal with the hidden specification state, the refinement checking prob- 
lem rednces to checking transition invariants. The methodology is illus- 
trated on two case-studies involving popular network protocols, namely, 
PPP (point-to-point protocol for establishing connections remotely) and 
DHCP (dynamic-host-configuration-protocol for conhguration manage- 
ment in mobile networks). We also present a symbolic implementation 
of a reduction scheme based on compressing internal transitions in a hi- 
erarchical manner, and demonstrate the resulting savings for refinement 
checking in terms of memory size. 



1 Introduction 

Network protocols have been a popular domain of application for model checkers 
for over a decade (see, for instance, [15, 10]). A typical application involves check- 
ing temporal requirements, such as absence of deadlocks and eventual transmis- 
sion, of a model of a network protocol, such as TCP, extracted from a textbook 
description or a standard documentation such as a network RFC (Request for 
Comments) document. While this approach is effective in detecting logical errors 
in a protocol design, there is still a need to formally analyze the actual implemen- 
tation of the protocol standard to reveal implementation errors. While analyzing 
the code implementing a protocol, the standard specification, typically available 
as a network RFC, can be viewed as the abstract model. Since the standard 
provides implementation guidelines for different vendors on different platforms, 
analysis tools to detect inconsistencies with respect to the standard can greatly 
enhance the benefits of standardization. 

The problem of verifying a protocol implementation with respect to its stan- 
dardized documentation can naturally be formulated as refinement checking. 
The implementation model / is extracted from the code and the specification 
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model S is extracted from the RFC document. We wish to verify that I ^ S 
holds, where the notion ^ of refinement is based on language inclusion. A re- 
cent promising approach to automated refinement checking combines assume- 
guarantee reasoning with search algorithms [19, 14,4], and has been successfully 
applied to synchronous hardware designs such as pipelined processors [20] and 
a VGI chip [13]. 

To establish the refinement, we employ the following three-step methodology 
(advocated, for instance in [4]). First, the refinement obligation is used to gen- 
erate simpler subgoals by applying assume guarantee reasoning [23,2,5,12,19]. 
This reduces the verification of a composition of implementation components to 
individual components, but verifies an individual component only in the con- 
text of the specifications of the other components. Second concerns verification 
of a subgoal I < S, when S has private variables. The classical approach is to 
require the user to provide a definition of the private variables of the specifi- 
cation in terms of the implementation variables (this basic idea is needed even 
for manual proofs, and comes in various disguises such as refinement maps [1], 
homomorphisms [17], forward-simulation maps [18], and witness modules [14, 
19]). Consequently, the refinement check I < S reduces to I\\W < S, where W 
is the user-supplied witness for private variables of S. As a heuristic for choos- 
ing W automatically, we had proposed a simple construction that transforms 
S to Eager(S), which is like S, but takes a stuttering step only when all other 
choices are disabled [4]. Once a proper witness is chosen, the third and final step 
requires establishing that every reachable transition of the implementation has 
a matching transition of the specification, and can be done by an algorithmic 
state-space analysis for checking transition invariants. 

For performing the reachability analysis required for verifying transition in- 
variants efficiently, we propose an optimization of the symbolic search. The pro- 
posed algorithm is an adaptation of a corresponding enumerative scheme based 
on compressing unobservable transitions in a hierarchical manner [6]. The ba- 
sic idea is to describe the implementation / in a hierarchical manner so that / 
is a tree whose leaves are atomic processes, and internal nodes compose their 
children and hide as many variables as possible. This suggests a natural optimiza- 
tion: while computing the successors of a state corresponding to the execution of 
a process, apply the transition relation repeatedly until a shared variable is ac- 
cessed. A more effective strategy is to apply the reduction in a recursive manner 
exploiting the hierarchical structure. In this paper, we show how this hierarchical 
scheme can be implemented symbolically, and establish significant reductions in 
space and time requirements. 

Our methodology for refinement checking is implemented in the model checker 
Mocha [3]. Our first case study involves verifying part of the RFC specifica- 
tion of Point-to-Point Protocol (PPP) widely used to transmit multi-protocol 
datagrams [22]. The implementation ppp version 2.4.0 is an open-source pack- 
age included in various Linux distributions. We extract the model ppp of the 
specification and the model pppd of the implementation manually. To establish 
the refinement, we need to assume that the communication partner behaves like 
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the specification model, thus, employ assume-guarantee reasoning. The specifi- 
cation has many private variables, and we use the “eager witness” construction 
to reduce the problem to transition invariant check. Our analysis reveals an in- 
consistency between the C-code and the RFC document. The second case study 
concerns the Dynamic Host Configuration Protocol (DHCP) that provides a 
standard mechanism to obtain configuration parameters. We analyze the dhcp 
package version 2.0 patch level 5, the standard implementation distributed by 
Internet Software Consortium, with respect to its specification RFC 2131 [11]. 

2 Refinement Checking 

In this section, we summarize the definition of processes, refinement relation 
over processes, and the methodology for refinement checking. The details can be 
found in [4]. 

The process model is a special class of reactive modules [5] that corresponds 
to asynchronous processes communicating via read-shared write-exclusive vari- 
ables. A process is defined by the set of its variables, along with the constraints 
for initializing and updating variables. The variables of a process P are par- 
titioned into three classes: private variables that cannot be read nor written 
by other processes, interface variables that are written only by P, but can be 
read by other processes, and external variables that can only be read by P, and 
written by other processes. Thus, interface and external variables are used for 
communication, and are called observable variables. The process controls its pri- 
vate and interface variables, and the environment controls the external variables. 
The separation between private and observable variables is essential to applying 
our optimization algorithm based on compressing internal transitions. The state 
space of the process is the set of possible valuations to all its variables. A state is 
also partitioned into different components as the variables are, for instance, con- 
trolled state and external state. The initial predicate specifies initial controlled 
states, and the transition predicate specifies how the controlled state is changed 
according to the current state. 

In the following discussion, we write B[X] for the set of predicates over vari- 
ables in X. For the set of variables X, we write X' for the corresponding variables 
denoting updated values after executing a transition. Furthermore, for sets of 
variables X = {xi} and Y = {yi} with the same cardinality, X = Y denotes 
BiXi = yi- For any subset Z of variables X and P € B[X], 3Z.P and VZ.P stand 
for the existential and universal quantification over the variables in Z. 

Definition 1. A process P is a tuple (X,I,T) where 

— X = {Xp,Xi,Xe) is the (typed) variable declaration. Xp, Xi, X^ repre- 
sent the sets of private variables, interface variables and external variables 
respectively. We define Xc = Xp U Xi to be the controlled variables, and 
Xo = XiU Xe to be the observable variables; 

— Given a set X of typed variables, a state over X is an assignment of variables 
to their values. We define Qc to be the set of controlled states over Xc, Qe 
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to be the set 0 / external states over Xg, Q = Qc x Qe to be the set of states, 
and Qo to be the set 0 / observable states over Xo; 

— I G .8 [Ale] is the initial predicate; 

— T G B[X,Xg] is the transition predicate with the property (called asyn- 
chronous property^ that {X( = Xg) T. 



The asynchronous property says that a process may idle at any step, and thus, 
the speeds of the process and its environment are independent. In order to sup- 
port structured descriptions, we would like to build complex processes from 
simple ones. Three constructs, hide H in P, P\\P' and P[X := Y] for building 
new processes are defined. The hiding operator makes interface variables inac- 
cessible to other processes, and its judicious use allows more transitions to be 
considered internal. The parallel composition operator allows to combine two 
processes into a single one. The composition is defined only when the controlled 
variables of the two processes are disjoint. The transition predicate of P\\Q is 
thus the conjunction of transition predicates of P and Q. The renaming operator 
P[X := Y] substitutes variables AT in P by Y. 

For a process P, the sets of its executions and observable traces are defined 
in the standard way. Given two processes P and Q, we say P refines Q, written 
P A Q, if each observable trace of P is an observable trace of Q. Checking refine- 
ment relation is computationally hard, and we simplify the problem in two ways. 
First, our notion of refinement supports an assume guarantee principle which 
asserts that it suffices to establish separately P 1 HQ 2 ^ Qi and Q 1 HP 2 ^ Q 2 in 
order to prove P 1 HP 2 ^ Qi\\Q 2 - This principle, similar in spirit to many previous 
proposals [23,2,5, 12, 19], is used to reduce the verification of a composition of 
implementation components to individual components, but verifies an individ- 
ual component only in the context of the specifications of the other components. 
The second technique reduces checking language inclusion to verifying transition 
invariants. If the specification has no private variables, an observable implemen- 
tation state corresponds to at most one state in the specification. The refinement 
check then corresponds to verifying that every initial state of P has a correspond- 
ing initial state of Q, and every reachable transition of P has a corresponding 
transition in Q. When Q has private variables, then the correspondence between 
implementation states and specification states should be provided by the user in 
order to make the checking feasible. The user needs to provide a witness W that 
assigns suitable values to the private variables of the specification in terms of 
implementation variables. It can be shown that P < Q follows from establishing 
P|| IF ^ Q. In our setting of asynchronous processes, it turns out that the witness 
W itself should not be asynchronous (that is, for asynchronous W, P||IF :< Q 
typically does not hold). This implies that the standard trick of choosing the 
witness to be the subprocess Qp of Q that updates its private variables, used in 
many of the case studies reported in [20, 13], does not work in the asynchronous 
setting. As a heuristic for choosing W automatically, we have proposed a con- 
struction that transforms Qp to Eager{QP), which is similar to the subprocess 
QP, but takes a stuttering step only when all other choices are disabled [4]. This 
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construction is syntactically simple, and as our case studies demonstrate, turns 
out to be an effective way of automating witness construction. The complexity 
of the resulting check is proportional to the product of P and Qp. 

3 Symbolic Search with Hierarchical Reduction 

In this section, we consider the problem of verifying P ^ Q when Q does not have 
any private variables. In this case, if one can check that all reachable P transitions 
have corresponding transitions in Q, then P <Q holds. Since all variables of Q 
appear in P, the corresponding transitions can be obtained by projection, and 
the problem can be solved by an appropriately modified reachability analysis. 
The core routine is Next: given a process P and a set R of its states, Next{P, 
R) returns the set T of transitions of P starting in R along with the set S 
of successors of R. There is, however, a practical problem if one intends to 
implement the successor function Next with existing BDD packages. Since Next 
needs to return the set of transitions, early quantification, an essential technique 
for image computation, is less effective. In [6], we have reported a heuristic to 
improve the enumerative search algorithm. In this section, we propose a symbolic 
algorithm to implement it. 

We use Next P represent the process obtained by merging “invisible” tran- 
sitions of P where invisibility is defined to be both write-invisible (not writing 
to interface variables) and read-invisible (not reading from external variables). 
Let T e B[Xp, Xi, XoT X'^, X'j\ be a transition predicate (the primed variables 
denote the updated values). The write-invisible transitions are captured by the 
predicate T A (W = X') (the second clause says that the interface variables 
stay unchanged) and read-invisible transitions correspond to WX^.T (the quan- 
tification ensures that the transition is not dependent on external variables). 
Thus, the invisible component of T is T A (W = Af') A VXg.T, and the visible 
component T„ is T A Define the concatenation Ti cxi T2 of two transition 
predicates Ti, T2 G B[X, X'] to be 3 Z.Ti[X' *-Z\K T2[X ^ Z], 

Definition 2. Let P = {{Xp, Xi, Xq), I,T) be a process. Define Next P = 
{{Xp,X,,Xo),I,T') with r = {X, = X') V V (T, M ^)). ■ 

The transition predicate of Next P is equivalent to (X = X') V V (Tf cxi 
Tjf) V (Tf t<i Tf t<i T^) V • • •. In other words, a transition in Next P is either 
a stuttering transition, or zero or more invisible transitions followed by a visible 
transition of P. 

It can be shown that Next P and P are equivalent (modulo stuttering). 
Furthermore, the Next operator is congruent [4]. This allows us to apply the 
Next operator to every subprocess of a process constructed by parallel compo- 
sition, hiding and instantiation. We proceed to describe a symbolic algorithm for 
state-space analysis of a process expression with nested applications of Next, 
without precomputing the transition relations of the subprocesses (such a pre- 
computation would require an expensive transitive closure computation). 
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funct Next{M, R) = 
if M = P 

then helper := AQ.let Qc := Q[X^] 

Tc := Tp A Qc 

R' := (3Xf .Tp A Q)[X'f ^ Xf] 
R" := R' \ cache 
cache := cache V R' 
in (Tc,P") 

return NextAuc(P, helper, R) 
elsif M = M1IIM2 

then (Ti . N-\ ) ~Next{Mi,R) 

It 2,N2) ■- Next{M2,R) 

51 = (3X"TP) a = X"i) 

52 = (3X"i.P) A ^ 

T := (Ti A S2) V (T2 A Si) V (Ti A T2) 

N' ■- [3X^.R A Ti A T2)[X'^ ^ X!f] 

N ■- NiV N 2 V N' 
return (T, N) 
elsif M = hideY InMi 

then helper XQ.Next{Mi, Q) 

return NextAuc{M, helper, R) 

R 



Fig. 1. Algorithm Next. 



The algorithm Next (figure 1 ) computes the visible transitions of a process 
M from the current states R by proceeding according to the structure of M. 
For each case, a tuple of transitions and a set of new states is returned. Each 
atomic process takes its turn to update its controlled variables as the algorithm 
traverses the expression. Whenever a state is reached by the current exploration, 
we check if it has been visited. If not, the state is put in the newly reached states. 
The transition compression of subprocesses is performed by applying Next im- 
plicitly in cases of atomic processes and hiding. This is achieved by invoking the 
function NextAuc to merge invisible transitions in these two cases. For parallel 
composition M1WM2, it is not necessary to do so since variable visibility remains 
the same. Therefore, the algorithm simply invokes itself recursively to obtain 
transitions Ti and T2 corresponding to subprocesses Mi and M2 respectively, 
and computes the composed transitions for the following three cases: ( 1 ) Mi 
takes a transition in Ti and M2 stutters; ( 2 ) M2 takes a transition in T2 and Mi 
stutters; and ( 3 ) both Mi and M2 take transitions in Ti and T2 respectively. 

For atomic processes and the case of hiding, the helper function is given 
to NextAuc as a parameter. It returns transitions and new states of the sub- 
process before Next is applied. For hiding, the helper function simply returns 
the transitions and new states of Mi, and the algorithm Next lets NextAuc do 
the transition compression. For an atomic process, the helper function computes 
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comment : helper returns a tuple of lower-level transitions 
comment : and newly reached states from the given set of states. 
funct NextAuc{M, helper, R) = 

N := false 
T := false 
I := true 
Q ■- R 
do 

{T',N') := belper(Q) 

T' ■- (T' A (X'" = Xf)) A (yX^.T') 

T := (7 ixi T') V T 
7 := 7 ixi 7^' 

Q' := yX^.Q A Tl)[X'y^ ^ Xf ] 

Q--N' A Q' 

X:=XV(X'\Q) 
while Q 7 ^^ 0 
return (T, X) 



Fig. 2. Algorithm NextAue. 



transitions Tc and new states R" . It then returns the transitions and newly 
reached states after updating cache. 

Figure 2 shows the NextAue algorithm for invisible transition compression. 
The naive fixed-point computation hinted in definition 2 is expensive and unnec- 
essary. Rather than computing fixed points, our algorithm generates the transi- 
tion predicate of Next P on the fly by considering only the current states. The 
idea is to compute ixi • • • [xi [xi T„ incrementally until all visible transitions 
reachable from the current states are generated. Several variables are kept by 
the algorithm to perform the task. N accumulates newly reached states in each 
iteration, T consists of compressed transitions, 7 is the concatenation of consec- 
utive invisible transitions and Q is the states reached by invisible transitions in 
the current iteration. 

The algorithm NextAue first computes the invisible component T' in T' . The 
new transitions T' are added to T after concatenated with previous invisible 
transitions. The concatenated invisible transition 7 is updated by appending T[. 
To compute states for the next iteration, the set Q' of all reached states by 
current invisible transitions is generated. The new states Q reached by invisible 
transitions are the intersection of the newly reached states N' and invisible states 
Q' . Finally, the visible states of N' are put into the new visible states N . The 
main correctness argument about the algorithm is summarized by: 

Theorem 1. Let M he a process, R G B[X^] and suppose Next{M, R) returns 
(T,N). Then the predicate TAR captures the transitions o/Next M starting in 
R, and N contains all successor states of R that are not previously visited. ■ 

Implementation. The symbolic algorithm for refinement checking is imple- 
mented in the model checker Mocha [3]. The implementation is in Java using 
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Event 

Up : lower layer is Up 
Down : lower layer is Down 
Open : administrative Open 
Close : administrative Close 

TO'*' : Timeout with counter > 0 
TO~ : Timeout with counter expired 



Action 

tlu : This-Layer-Up 
tld : This-Layer-Down 
tls : This-Layer-Started 
tlf : This-Layer-Finished 

ire : Initialize-Restart-Count 
zrc = Zero-Restart-Count 

scr : Send-Configure-Request 

sea = Send-Configure-Ack 
sen = Send-Configure-Nak/Rej 



RCR"*" : Receive-Configure-Request (Good) 
RCR” : Receive-Configure-Request (Bad) 
RCA : Receive-Configure-Ack 
RCN : Receive-Configure-Nak/Rej 



RTR : Receive-Terminate- Request str = Send-Terminate-Request 

RTA : Receive-Terminate- Ack sta = Send-Terminate-Ack 
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Fig. 3. The PPP Option Negotiation Automaton. 



the BDD-packages from VIS [7]. The transition predicate is maintained in a 
conjunctive form. The details are omitted here due to lack of space. 

4 Verification of Network Protocols 

4.1 Point-to-Point Protocol 

Point-to-Point Protocol (PPP) is designed to transmit multi-protocol datagrams 
for point-to-point communications [22]. To establish the connection, each end 
sends Link-layer Control Protocol (TCP) packets to configure and test the data 
link. The authentication may be followed after the link is established. Then PPP 
sends Network Control Protocol packets to choose and configure network-layer 
protocols. The link will be disconnected if explicit LCP or NCP packets close 
it, or certain external events occur (for instance, modem is turned off). In this 
case study, we focus on checking an implementation of the option negotiation 
automaton (section 4 in [22]) for link establishment. 

Protocol RFC Specification. Figure 3 reproduces the transition table of the 
automaton as shown in section 4.1 of the specification. As one can see from the 
table, events and actions are denoted by symbols. For each entry in the table, 
it shows the actions and the new state of the automaton. If there are multiple 
actions to be performed in a state, they are executed in an arbitrary order. 
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Static void 
f sm_rt ermack (f ) 
fsm *f; 

{ 

switch (f->state) { 

/* other cases here */ 
case OPENED: 

if (f->callbacks->down) 

(*f->callbacks->down) (f ) ; /♦ Inform upper layers */ 

f sm_sconfreq(f , 0); 
break; 

} 

} 



Fig. 4. Code-style in fsm.c. 



When initiating a PPP connection, the host first sends a configuration re- 
quest packet (scr) to its peer and waits for the acknowledgment (RCA or RCN). 
The peer responds by checking the options sent in the request . If the options are 
acceptable, the peer sends a positive acknowledgment (sea). Otherwise, a nega- 
tive acknowledgment (sen) is sent to the host. In any case, the peer also sends its 
configuration request packet to the host. They try to negotiate options accept- 
able to both of them. After they agree on the options, both move to the Opened 
state and start authentication phrase (or data transmission, if authentication is 
not required). The communication can be terminated by Close event explicitly 
or Down event (perhaps due to hardware failure). A termination request (str) 
is sent if the link is closed explicitly. A restart counter is used to monitor the 
responses to request actions (scr and str). If the host has not received the ac- 
knowledgment from the peer when the timer expires. It sends another request if 
the counter is greater than zero. Otherwise, it stops the connection locally. 

Implementation. The implementation ppp version 2.4.0^ is an open-source 
package included in various Linux distributions and widely used by Linux users. 
The package contains several tools for monitoring and maintaining PPP connec- 
tions as well. The daemon pppd implements the protocol and is of our concern 
here. The file main . c uses the subroutines defined in fsm.c to maintain the fi- 
nite state machine. Events and actions have their corresponding subroutines in 
fsm.c. In this work, we assume events and actions are handled correctly. There- 
fore we leave them as symbols as in the specification. Figure 4 shows how the 
program behaves on event RTA (receive terminate acknowledgment). For each 
state that can handle the RTA event, a case statement is put in the subroutine. 
For instance, if RTA is received when the state is Opened, it will inform the 
upper layers, send a configuration request (f sm_sconf req) and returns. There 
are 2,589 lines in files main, c and fsm. c. 

Modeling. Once we have defined the constants for events and actions. It is 
easy to construct a process for the automaton. The following guarded command 



Available at ftp : //ftp . linuxeare . com. au/pub/ppp/ppp-2 .4 . 0 . tar .gz. 
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(written in the language of Mocha [3]) models the behavior when the state is 
Opened and the event RTA occurs (figure 3). 

[] state = Opened & in_p = in_v & evt = RTA & out_p ~= out_v -> 
act’ := scr; out_p’ := out_v; counter’ := dec counter by 1; 
state’ := Req_Sent; in_v’ := ~in_p 

The variable state denotes the current state, evt the event, and act the 
action. The variable counter represents the restart counter. It is decremented 
by one if the action scr is performed. The variables in_p and in_v model the 
input channel: the channel is empty if and only if they are equal. Similarly, out_p 
and out_v are for the output channel. 

For the corresponding implementation (figure 4), more variables are needed 
to help us for modeling and recovering traces faithfully. We use the variable addr 
to record which subroutine is modeled by the current transition. The boolean 
variable timer is used to model the timeout event: if timer is true and the 
program is in the main loop, it may go to timeout handler. Other variables 
share the same meaning as those in the specification model. 

[] addr = rtermack & state = Opened & out_p ~= out_v -> 

act’ := scr; out_p’ := out_v; timer’ := true; counter’ := 2; 
in_ V ’ : = ~ in_p ; addr ’ : = input 

Another process Link is used to model the network channel. It accepts an 
action from one automaton, translates it to an event, and forwards the event to 
the other automaton. We manually translate the C program to reactive modules. 
Since the program is well-organized (as seen in figure 4), it may be possible to 
translate it automatically. The resulting description in Mocha contains 442 lines 
of code (182 lines for pppd and 260 lines for the specification). 

Verification. Having built the models of the specification and implementation, 
we wish to apply the refinement check. However, certain aspects of the specifica- 
tion are not explicitly present in the implementation. For instance, the automa- 
ton is able to send a couple of packets in any order if it is in the state Stopped 
on event RCR'*' or RCR“. Two variables are introduced to record which packets 
have been sent. These variables do not appear in the C program but only in the 
specification model. As discussed earlier, we need a witness to define these spec- 
ification variables in terms of the implementation variables. We use the heuristic 
suggested in [4] to use the eager witness E, and check if pppd\\ E A ppp where 
pppd and ppp are the formal models of implementation and specification respec- 
tively. However, this refinement relation does not hold. It fails because pppd is 
built with the assumption that it communicates with another PPP automaton. 
Consequently, we try to establish pppdO\\link\\pppdl ^ pppO^ where pppdO, pppl 
are instances of the implementation model pppd, and link is the model of the 
network channel. Using assume-guarantee reasoning, in conjunction with the 
witness module, this verification goal can be simplified to 



pppd0\\ link\\ pppl II A A pppO. 
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This amounts to establishing that the implementation pppd refines the specifi- 
cation ppp assuming the communication partner satisfies the specification and 
using E as a witness for the private variables of the specification. 

Analysis Result. To check the refinement obligation, we use a prototype built 
on top of the model checker Mocha [3]. It produces a trace which describes 
an erroneous behavior of the implementation. The bug can be seen in the code 
segment shown in figure 4. On receiving RTA at state Opened, the automaton 
should bring down the link (tld), send a configuration request (scr) and go to 
state Req-Sent. However, the implementation does not update the state after 
it brings down the link and sends the request. In almost all circumstances, the 
bug is not significant. It can only be observed if the user tries to open the link 
instantaneously after the disconnection. Our translation lets us trace the bug 
in the C program easily. After we fix the bug, the refinement relation can be 
established. 

In terms of computational requirements of the refinement check, in compari- 
son to the IWLS image package available in VIS [7], our algorithm requires less 
memory: while the maximum MDD size with IWLS package is 265,389 nodes, 
our optimized algorithm the corresponding size is 188,544 nodes, a saving of 
about 30%. It takes IWLS package 5294.95s to finish while ours for 2318.87s, a 
saving of 56%. 

4.2 Dynamic Host Configuration Protocol 

The Dynamic Host Configuration Protocol (DHCP) provides a standard mech- 
anism to obtain configuration parameters. It is widely used in mobile environ- 
ment, especially for network address allocation. The protocol is designed based 
on the client-server model. Hosts which provide network parameters are called 
servers. They are configured by network administrators with consistent informa- 
tion. Clients, on the other hand, communicate with servers and obtain proper 
parameters to be a host in the network. In a typical scenario, a laptop obtains its 
network address after it is plugged in any network recognizing DHCP. The user 
can then access to the network without filling network parameters manually. 

The DHCP specification [11] only describes the state machine informally. 
The state-transition diagram found in section 4.4 [11] gives a global view of the 
protocol. The details are written in English and scattered around the document. 
The dhcp package version 2.0 patch level 5^ is the standard implementation dis- 
tributed by Internet Software Consortium. We are interested in knowing whether 
the client (dhclient . c) is implemented correctly. The implementation does not 
appear to follow the specification strictly. For instance, it lacks two of the states 
shown in the state diagram. As a result, it is much more challenging to write 
down formal models for the specification and implementation in this case than 
for PPP. We adopt the same style and build four processes: the client spec- 
ification client, the client implementation dhclient, the server server and the 
communication channel link. Since the implementation performs transitions in 

^ Available at http://www.isc.org/products/DHCP/dhcp-v2.html. 
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several stages, an eager module is introduced to resolve the timing difference. To 
make the model more realistic, we make the channel link lossy. We do not find 
any inconsistency during the check dhclient\\link\\server\\E A client. 

In terms of computational requirements of the refinement check, while the 
maximum MDD size with IWLS package is 13,692 nodes, our optimized algo- 
rithm the corresponding size is 29,192 nodes. However, IWLS package takes 
350.84s in comparison to 82.70s in our algorithm. It takes 76% less in time in 
the presence of 53% more in space. We speculate the dynamic ordering algorithm 
causes this abnormality; further investigation is surely needed. 

5 Conclusions 

The main contribution of this paper is establishing applicability of refinement 
checking methodology to verification of implementations of network protocols 
with respect to RFC documentations. The relevance of the various steps in the 
methodology is supported by two case studies involving popular protocols, with 
an inconsistency discovered in one case. We have also proposed a symbolic search 
algorithm for compressing internal transitions in a hierarchical manner, and 
established the resulting savings in memory requirements. 

In both case studies, the model extraction was done manually. This is un- 
avoidable for extracting specification models since RFC documents typically 
describe the protocols in a tabular, but informal, format. As far as automating 
the generation of implementation models from C-code, the emerging technology 
for model extraction [8, 16,9,21] can be useful. 
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CCR97-34115, by SRC contract 99-TJ-688, by Bell Laboratories, Lucent Tech- 
nologies, and by Sloan Faculty Fellowship. 

References 

1. M. Abadi and L. Lamport. The existence of refinement mappings. Theoretical 
Computer Science, 82(2):253-284, 1991. 

2. M. Abadi and L. Lamport. Composing specifications. ACM TOPLAS, 15(1):73- 
132, 1993. 

3. R. Alur, L. de Alfaro, R. Grosu, T. Henzinger, M. Kang, R. Majumdar, F. Mang, 
C. Kirsch, and B. Wang. MochA: A model checking tool that exploits design 
structure. In Proceedings of 23rd Inti. Conference on Software Engineering, 2001. 

4. R. Alur, R. Grosu, and B.-Y. Wang. Automated refinement checking for asyn- 
chronous processes. In Proc. Third Inti. Workshop on Formal Methods in 
Computer-Aided Design. Springer, 2000. 

5. R. Alur and T. Henzinger. Reactive modules. Formal Methods in System Design, 
15(l):7-48, 1999. 

6. R. Alur and B.-Y. Wang. “Next” heuristic for on-the-fiy model checking. In 
CONCUR’99: Concurrency Theory, Tenth Inti. Conference, LNCS 1664, pages 
98-113. Springer, 1999. 




Verifying Network Protocol 181 



7. R. Brayton, G. Hachtel, A. Sangiovanni-Vincentelli, F. Somenzi, A. Aziz, S. Cheng, 

S. Edwards, S. Khatri, Y. Kukimoto, A. Pardo, S. Qadeer, R. Ranjan, S. Sarwary, 

T. Shiple, G. Swamy, and T. Villa. VIS: A system for verification and synthesis. In 
Proc. Eighth Inti. Conference on Computer Aided Verification, LNCS 1102, pages 
428-432. Springer-Verlag, 1996. 

8. J. Corbett, M. Dwyer, J. Hatcliff, S. Laubach, C. Pasareanu, Robby, and H. Zheng. 
Bandera: Extracting hnite-state models from Java source code. In Proceedings of 
22nd Inti. Conference on Software Engineering, pages 439-448. 2000. 

9. S. Das, D. Dill, and S. Park. Experience with predicate abstraction. In Computer 
Aided Verification, 11th Inti. Conference, LNCS 1633, pages 160-171. Springer, 
1999. 

10. J. Fernandez, H. Garavel, A. Kerbrat, R. Mateescu, L. Mounier, and M. Sighire- 
anu. CADP: A protocol validation and verification toolbox. In Proc. Eighth Inti. 
Conference on Computer-Aided Verification, LNCS 1102. Springer-Verlag, 1996. 

11. R. Droms. Dynamic Host Configuration Protocol, March 1997. RFC 2131. 

12. O. Griimberg and D. Long. Model checking and modular verification. ACM Trans- 
actions on Programming Languages and Systems, 16(3):843-871, 1994. 

13. T. Henzinger, X. Liu, S. Qadeer, and S. Rajamani. Formal specihcation and verifi- 
cation of a dataflow processor array. In Proc. Inti. Conference on Computer-aided 
Design, pages 494-499, 1999. 

14. T. Henzinger, S. Qadeer, and S. Rajamani. You assume, we guarantee: Methodol- 
ogy and case studies. In CAV 98: Computer-aided Verification, LNCS 1427, pages 
521-525, 1998. 

15. G. Holzmann. The model checker SPIN. IEEE Trans, on Software Engineering, 
23(5):279-295, 1997. 

16. G. Holzmann and M. H. Smith. Software model checking - extracting verifica- 
tion models from source code. In Formal Methods for Protocol Engineering and 
Distributed Systems, pages 481-497, Kluwer Academic Publ., 1999. 

17. R. Kurshan. Computer-aided Verification of Coordinating Processes: the automata- 
theoretic approach. Princeton University Press, 1994. 

18. N. Lynch and M. Tuttle. Hierarchical correctness proofs for distributed algorithms. 
In Proc. Seventh ACM Symposium on Principles of Distributed Computing, pages 
137-151, 1987. 

19. K. McMillan. A compositional rule for hardware design rehnement. In CAV 91: 
Computer-Aided Verification, LNCS 1254, pages 24-35, 1997. 

20. K. McMillan. Verification of an implementation of tomasulo’s algorithm by com- 
positional model checking. In CAV 98: Computer-Aided Verification, LNCS 1427, 
pages 110-121, 1998. 

21. K. Namjoshi and R. Kurshan. Syntactic program transformations for automatic 
abstraction. In Computer Aided Verification, 12th Inti. Conference, LNCS 1855, 
pages 435-449. Springer, 2000. 

22. W. Simpson. The Point-to- Point Protocol. Computer Systems Consulting Services, 
July 1994. STD 51, RFC 1661. 

23. E. Stark. A proof technique for rely-guarantee properties. In Found, of Software 
Technology and Theoretical Computer Science, LNCS 206, pages 369-391, 1985. 




Automatic Abstraction for Verification of 
Timed Circuits and Systems* 



Hao Zheng, Eric Mercer, and Chris Myers 

University of Utah, Salt Lake City UT 84112, USA 
{hao , eemercer , myers}@vlsigroup . elen . Utah . edu 
http : //async . elen.utah.edu 



Abstract. This paper presents a new approach for verification of asyn- 
chronous circuits by using automatic abstraction. It attacks the state 
explosion problem by avoiding the generation of a flat state space for the 
whole design. Instead, it breaks the design into blocks and conducts ver- 
ification on each of them. Using this approach, the speed of verification 
improves dramatically. 



1 Introduction 

In order to continue to produce circuits of increasing speed, designers are con- 
sidering aggressive circuit design styles such as self-resetting or delayed-reset 
domino circuits. These design styles can achieve a significant improvement in 
circuit speed as demonstrated by their use in a gigahertz research microproces- 
sor (guTS) at IBM [15 . Designers are also considering asynchronous circuits due 
to their potential for higher performance and lower power as demonstrated by 
the RAPPID instruction length decoder designed at Intel m- This design was 3 
times faster while using only half the power of the synchronous design. The cor- 
rectness of these new timed circuit styles is highly dependent upon their timing, 
so extensive timing verification is necessary during the design process. Unfortu- 
nately, these new circuit styles cannot be efficiently and accurately verified using 
traditional static timing analysis methods. This lack of efficient analysis tools is 
one of the reasons for the lack of mainstream acceptance of these design styles. 

The formal verification of timed circuits often requires state space exploration 
which can explode even for modest size examples. To reduce the complexity in- 
curred by state exploration, abstraction is necessary. In [2I2I1, safe approxima- 
tions of internal signal behavior are found to reduce the size of the state space, 
but these methods are still exponential in the number of memory elements. In 
VIS |0|, non-determinism is used to abstract the behavior of some circuit sig- 
nals, but it is often too conservative and can introduce unreachable states which 
may exhibit hazards. In m, a model checker is proposed based on hierarchical 
reactive machines. By taking advantage of the hierarchy information, it only 
tracks active variables so that the state space is reduced and verification time is 
improved, but this approach is best suited for software which has a more sequen- 
tial nature. In uni, an abstraction technique is proposed for validation coverage 

* This research is supported by NSF CAREER award MIP-9625014, SRC contract 
97-DJ-487 and 99-TJ-694, and a grant from Intel Corporation. 



G. Berry, H. Comon, and A. Finkel (Eds.): CAV 2001, LNCS 2102, pp. 182-^2SI 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 



Automatic Abstraction for Verification of Timed Circuits and Systems 



183 



analysis and automatic test generation. It removes all datapath elements which 
do not affect the control ffow and groups the equivalent transitions together 
resulting in a dramatic reduction in the state space. It is difficult, however, to 
distinguish the control from the datapath without help from the designers. In 
an abstraction approach for the design of speed-independent asynchronous 
circuits from change diagrams is described. In this approach, each subcircuit is 
designed individually, and they are then recombined to produce the final circuit. 
This approach, however, does not address timing issues. In a divide-and- 
conquer method is presented for the synthesis of asynchronous circuits. This 
method breaks up the state graph into a number of simpler subgraphs for each 
output, and each subgraph is solved individually. The results are then integrated 
together to construct the final solution. This method, however, requires a com- 
plete state graph to start with. An assume- guarantee reasoning strategy is shown 
m In such cases, when verifying a component in a system, assumptions need 
to be made about the behavior of other components, and these assumptions are 
discharged when the correctness of other components is established, while our 
approach is similar to assume-guarantee reasoning, our approach does not re- 
quire assumptions about the other components because their behavior is derived 
from the specifications using semantics-preserving abstraction. In 0, Belluo- 
mini describes the verification of domino circuits using ATACS. She shows that 
verifying fiat circuits even of a moderate size can be very difficult, while the 
verification can be completed quickly using hand abstractions. However, these 
hand abstractions require an expert user and methods must be developed to 
check that the abstractions are a reliable model of the underlying behavior. This 
is the major motivation of this work. 

Our approach begins with a high-level language, such as VHDL, that models a 
system hierarchically. The method then compiles each individual component into 
a timed Petri-net for verification. This paper proposes an abstraction technique 
applied to timed Petri-nets. This approach partitions the design into small blocks 
using specified structural information, and each block is verified separately. We 
have proven that under certain constraints if each block is verified to be correct, 
then the complete system is also correct. Our results show that taking advantage 
of the hierarchical information results in a substantial savings in verification 
time. 



2 Timed Petri-Nets and Basic Trace Theory 

Timed Petri-nets (TPNs) are the graphical model to which our high-level 
specification is compiled. A one-safe TPN is modeled by the tuple (P, T, P, Mq, A) 
where P is the set of places, T is the set of transitions, and PC {P xT)U{T x P) 
is the flow relation, Mq C P is the initial marking, and A is an assignment of 
timing constraints to places. There are three kinds of transitions: s-l- changes 
signal s from 0 to 1, s— changes s from 1 to 0, and $ which is a sequencing transi- 
tion. A marking is a subset of places. For a place p G P, the preset of p (denoted 
•p) is the set of transitions connected to p (i.e., »p = {t G T \ (t,p) G F}), and 
the postset of p (denoted p») is the set of transitions to which p is connected 
(i.e., p» = {t gT \ (p,t) G P}). Presets and postsets for transitions are similarly 
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Fig. 1. Portion of the TPN for a STARI Circuit Composed of two FIFO Stages. 



defined. A timing constraint consisting of a lower and upper bound is associated 
with each place in the TPN (i.e., A{pi) = The lower bound is a non- 

negative integer while the upper bound is an integer greater than or equal to 
the lower bound or oo. A benchmark for timed circuit design is the STARI com- 
munication circuit PI which is used to communicate between two synchronous 
systems that are operating at the same clock frequency, but are out-of-phase 
due to clock skew. The STARI circuit is essentially a FIFO connecting the two 
systems. A portion of the TPN for a STARI circuit with 2 FIFO stages is shown 
in Figure IHa). To simplify the diagram, places between transitions have been 
removed. A token indicates that the place is initially marked. 

A transition t is enabled in a marking M if C M. A timer is associated 
with each place p € M . For each p G P, timer(p) is initialized to zero when p is 
put into the marking. All timers in a marking increase uniformly. Let lower(p) 
and upper(p) be the lower and upper bounds of the timing constraints oi p G P. 
For a p G M, timer(p) is satisfied if timer(p) > lower(p); timer(p) is expired if 
timer(p) > upper(p). A transition t cannot occur until it is enabled in a marking 
and timer(p) is satisfied for all p G ut. A transition t must fire before timer(p) 
is expired for all p € •t. Firing a transition t changes the current marking M to 
a new marking M' = (M — •t) U where timer(p) = 0 for all p G t». The net 
is 1-safe if (M — •f) n t* = 0. 

The timing properties of a system are specified using a set of constraint 
places. Constraint places never actually enable a transition to fire. Instead, the 
constraint places are checked each time a transition fires in a marking. Failures 
caused by constraint places arise due to three conditions: 

1. There exists a constraint place p G »t such that p ^ M when firing t. 

2. timer(p) is not satisfied for any constraint place p G »t when firing t. 

3. timer(p) is expired for any constraint place p G *t before firing t. 
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The dynamic behavior of a Petri net can be studied using reachability anal- 
ysis. A marking M„ is said to be reachable from a marking Mq if there exists 
a sequence of firings that changes Mq to A firing sequence or run from a 
marking Mg is defined as p = Mg Mi M 2 ^ ^ Mn. A sequence 

of transitions generated by a firing sequence p is called a trace. In the above 
example, M„ is reachable from A/g through a trace tit 2 . . .tn. Let X be the set 
of all possible traces produced by a Petri net N. X is prefix-closed and always 
includes the empty trace e. 

The same concept can be extended to TPNs. A state S' of a TPN is a pair 
(M, timer), where M is a marking and timer is a function p for all p G M. 

The initial state Sg is (Mg, timerg), where timerg(p) = 0 for all p G Mg. In a 
state S = (M, timer), a transition t can fire if t is enabled in M and timer(p) 
is satisfied for all p G . The new state S' = (M', timer') is obtained from S 
by firing t. M' = {M — ut) U and timer' (p) = 0 for all p G tu. A timed firing 

sequence or timed run in a TPN is defined as p = Sg ^ Si S 2 S„, 

where Sg is the initial state. Si+i is obtained from Si by passing some time until 
all rules in •ti+i are satisfied and then firing tj+i. Let timei(p) be the sum of 
time that has passed for the system to reach the state St from the initial state 
Sg through the firing sequence p. It is true that timeg(p) = 0 and timei_|_i(p) = 
timei(p) -\- T where I < t < u, I = maa;({lower(r)|r G Mi for r G •ti+i}), 
and u = ? 7 iax({upper(r)|r G Mi for r G •fi+i}). Thus, a run p produces a timed 
trace (ti, timei(p))(f 2 , time 2 (p)) • • •. Let X be the set of all possible timed traces 
produced by a timed Petri net N. X is also prefix-closed. 

Since the reachability analysis of a TPN can be uniquely determined by 
all its possible timed traces, the system behavior can also be described using 
trace theory. Trace theory has been applied to the verification of both speed- 
independent 1 ^ and timed circuits I7E71 . A timed trace, x, is a sequence of 
events (i.e., x = cgci...). In trace theory, it is not necessary to distinguish 
the rising and falling transitions on the same signal, the signal name is used 
to represent both transitions on the same signal. Therefore, each timed event is 
of the form = (wi,ti) where w is a signal name in the TPN. f is a rational 
number indicating when a transition on a signal wire happens. A timed trace 
must satisfy the following two properties: 

— Monotonicity: ti < for all f > 0, and 

— Progress: if x is infinite then for any time t there exists an i such that ti > t. 

The delete function, del(Z3)(x), removes all events of a trace x = 6162 ... 
whose wire names are in a set D. More formally, 

where y = del(D)(e 2 e 3 ...) and ei = (wi,fi). It is extended naturally to sets of 
traces. The inverse delete function, del“^(Z3)(A), takes a set of wires, D, and a 
set of traces, X, and returns the set of traces which would be in X if all events 
with wire names in D are deleted (i.e., del“^(Z3)(A) = {x' \ del(D)(a;') G A}). 
Intuitively, if a: is a trace not containing symbols from D, del“^(D)(a:) is the set 
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of all traces that can be generated by inserting events in D at any time into x. 



Some useful properties of these two functions are below: 

del(i:i)(X) = 0 4^X = 0 (2) 

del{D){der\D'){X)) = der\D'){del{D){X)) when £> n £>' = 0 (3) 

del{D){der\D){X)) = X (4) 

del(Z?)(X n X') C del{D){X) n del(D)(X') (5) 



A prefix-dosed trace structure T is a three-tuple {1, 0,P). / is a set of input 
wires, and O is a set of output wires where /nO = 0. A = /UOis the alphabet of 
the structure. P = SUF is the set of all possible traces of a system where S and 
F are the success set and the failure set of T, respectively. The trace structure 
T of a TPN N can be derived using state space exploration on N. A function 
trace(fV) is defined to return a trace structure which has the same inputs and 
outputs as N. P of trace(A^) is the set of all possible timed traces produced 
by N. The function fail(A) is defined to return the set of all traces in P that 
cause safety violations or timing constraint violations. Therefore, F = fail(P). 
For hierarchical verification to succeed, the definition of fail (A) must satisfy the 
following requirement: 

fail(A) C fail(A') if X C X' (6) 

where X and X' are two sets of traces. This requirement states that for two 
sets of traces, correctness checking does not affect the relation of the two sets. 
S contains all successful traces of a system, and S = P — F. A trace structure 
must be receptive, meaning that PI C P. Intuitively, this means a circuit cannot 
prevent the environment from sending an input. 

Composition (||) combines two circuits into a single circuit. Composition of 
two trace structures T = {I, O, S, F) and T' = (/', O', S', F') is defined when On 
O' = 0. To compose two trace structures, the alphabets of both trace structures 
must first be made the same by adding new inputs as necessary to each structure. 
Inverse delete is extended to trace structures for this step as follows: 

der^(0)(T) = (/UO,0,der^(O)(S'),der^(O)(F)) (7) 

This is defined only when DDA = 0. After the two alphabets of the two structures 
are made to match, we need to find the traces that are consistent with the two 
structures. The intersection of these two trace structures is defined as follows: 

T n T' = (7 n o u O', S' n s', (f n p') u (p n f')) (8) 

This is defined only when A = A' and O n O' = 0. A success trace in the 
composite must be a success trace in both components. A failure trace in the 
composite is a possible trace that is a failure trace in either component. The 
possible traces for the composite is POP'. Composition can now be defined: 

T II T' = der^(A' - A){T) n der^(A - A')(T') (9) 

Another useful operation is hide which is used to make a set of wires, D, internal 
to the circuit. Given a trace structure T, hide(0)(T) is defined as follow: 

hide(£i)(r) = {I,0- D, del{D){S), del{D){F)) (10) 
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A trace structure is failure-free if its failure set is empty. Given two trace 
structures, T and T', we say T conforms to T' (denoted T A T') HI = T,0 = O' , 
and for all environments E, if E || T' is failure-free, so is E || T. Intuitively, if a 
system using T' cannot fail, neither can a system using T . 

Lemma n below gives a simple sufficient condition to determine conformance 
between two trace structures. The condition F C F' assures that if the envi- 
ronment does not cause a failure in T' , it does not cause a failure in T. The 
condition P C P' assures that if T' does not cause a failure in the environment, 
T does not cause one. Lemma|3 shows that if T conforms to T', this conformance 
is maintained in any environment. Proofs of these lemmas can be found in [S|. 

Lemma 1. T <T' if I = I' ,0 = O', F C F' , and P C P' . 

Lemma 2. If T <T' and T" is any trace structure, then T || T" ^ T' || T" . 

3 Automatic Abstraction and Safe Transformations 

Formal verification of timed systems is typically based on a complete exploration 
of the state space. The state space grows exponentially in the complexity of the 
design. This limits verification to small designs. In general, a large and complex 
design is organized as a number of components, each of which has a well-defined 
interface. To verify a timed system, an environment must be provided. The envi- 
ronment has two functions during verification. First, it defines and supplies the 
input behavior which the system must be able to process for correct operation. 
Second, the outputs of the system must not cause the environment to fail. Each 
component either connects to other components, the environment, or both. Since 
the complexity of each component is often much less than the whole system, it 
is desirable to verify each component individually, and integrate the results for 
all components when available to form the solution for the whole system. If a 
component is chosen for verification, the rest of the components and the system 
environment together form the environment in which the component operates. 
To verify a component, only the interface behavior of the environment is impor- 
tant to the component. Therefore, if the internal behavior of the environment is 
abstracted away while preserving its interface behavior, the environment can be 
simplified reducing the complexity of verification. 

To apply abstraction to TPNs, first, all internal signals relative to a chosen 
component are identified and all transitions on them converted to sequencing 
transitions; second, these sequencing transitions and the related places are re- 
moved safely from the TPNs, when possible. Consider the TPN shown in Fig- 
ure da). If we are synthesizing only the first stage of the two stage FIFO, then 
the signals ack2, x2.t, and a;^./ should be abstracted away. Transitions on these 
signals are changed to sequencing transitions as shown in Figure mb). 

Next, transformations are applied to remove these sequencing transitions. 
Suzuhi and Murata present a method of stepwise refinement of transi- 

tions and places into subnets. They show a sufficient condition that such subnets 
must satisfy which is dependent on the structure and initial marking of the net. 
The resulting net has the same liveness and safety properties as that of the 
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original net. This refinement process, however, has to be repeated every time 
the initial marking is changed. This makes automating the refinement difficult. 
Berthelot ^ presented several transformations that depend only on the structure 
of the net. In HHnarzi, several transformations for marked graphs are presented. 
These transformations reduce places and transitions in the graph while preserv- 
ing liveness and safety. All these transformations, however, are only applied to 
untimed Petri nets. 

We have developed several safe transformations for timed Petri nets. Safe 
transformations must obey two conditions. First, removal of a signal should 
never change the untimed semantics of the environment. Second, the timing 
information of the signal transitions produced by the environment must be pre- 
served in a conservative fashion. To explain these two conditions more precisely, 
we use trace theory. Suppose Ne is the TPN describing the behavior of the envi- 
ronment, and Te is its corresponding trace structure. The interface behavior of 
Te is described by del{D){PE), where D is the set of signals internal to the envi- 
ronment, and Pe is the set of possible traces. The environment after abstraction 
and safe transformations is called the abstracted environment. In the abstracted 
environment, the internal signals, D, are removed from Ne to obtain the trace 
structure Ta = trace(abs(D)(A^£:)). Function ahs{D){NE) returns a TPN N'^ 
where the signals in D are abstracted away from Ne using safe transformations. 
Let Xi and X 2 be the untimed trace sets produced by ahs{D){NE) and Ne, re- 
spectively. To preserve the interface behavior, a safe transformation must satisfy 
that X\ = del(Z3)(A2) and del(Z?)(PE) C Pa, where D contains the internal 
signals of the environment to be removed and Pa is the possible trace set of Ta ■ 
Intuitively, this means a safe transformation should never remove any specified 
behavior, but it may add new behavior. In other words, the verification result 
might be a false negative, but never a false positive. 

Figure 0 shows two simple transformations. Transformation 1 is used when 
a sequencing transition has a single or multiple places in its preset, and a single 
place in its postset. In transformation 2, the sequencing transition has a single 
place in its preset, and two or more places in its postset. While transforma- 
tion 1 adds no extra behaviors, transformation 2 may create extra interleavings 
between b and c not seen before the transformation. For example, after the trans- 
formation, the system could generate a trace (a, fy)(c, ta + ^i +^ 3 )(e, ta + U 2 + U 3 ), 
where ta is when a fires. This trace is impossible in the system before the trans- 
formation. 

The third transformation which involves a merge place is depicted in Fig- 
ure 0 This transformation like the last one may add additional timing behavior. 
However, if fy = li, and Ua = Ub then it is an exact transformation. This trans- 
formation is applied to the TPN in Figure [D^b) to obtain the reduced one shown 
in Figure ^c). Numerous other safe transformations have been developed and 
proven to be correct. Due to space limitations, these transformations and all 
proofs are omitted here, but can be found in [ 23 - 

In order to perform verification using TPNs, the possible traces P can be 
found using a timed state space exploration procedure such as the one described 
in After safe transformations, it is true that del(Z3)(PE) C Pa where Ta = 
trace(abs(Z?)(A^£;)) and D contains the internal signals to be removed. This 
indicates that the interface behavior of the environment after transformations is 
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Fig. 3. Safe Transformation 3. 




a superset of that before transformations. From Equation ^ we get the following: 

fail(del(i9)(PB)) C fail(P^) (11) 

This means that the failure set of the abstracted environment is a superset of the 
failure set of the unabstracted environment with internal signals hidden. Based 
on the above discussion and Lemmas the following lemma can be proved easily. 



Lemma 3. Given a system described by a TPN, Ne, with a trace structure Te, 
where D is the set of internal signals of the system. If the function ahs(D)(NE) 
uses only safe transformations, then hide(P)(TE) A trace(abs(Zl)(A^E)). 

Hierarchical verification verifies each block in a system individually. If each 
block is verified to be failure-free with its abstracted environment, then we can 
prove that the entire system is failure-free. This idea is formalized in the following 
theorems. Given two modules Mi = (Ji, Oi, Pi) and M2 = (/2, O2, P2), we would 
like to verify that their composition, M\ || M2, is failure-free. In the following 
theorem, X\ and X2 are the internal signal sets of M\ and M2, respectively (i.e., 
Xi = Oi — I2, X2 = O2 — Ii, and X\ n X2 = 0). 

Theorem 1. Let X\ and X2 be the internal signal sets of Mi and M2, re- 
spectively. If Ml |j hide(Al2)(M2) is failure-free, and hide(Ali)(Mi) || M2 is 
failure-free, then M = Mi || M2 is failure-free. 
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Proof: First, the failure set of Mi || M 2 is 

(del-i(X2)(fail(Pi)) n del-i(Xi)(P2))U 
(del-i(Xi)(fail(P2)) n der\X 2 ){Pi)) (12) 

Suppose Ml II hide(X 2 )(M 2 ) is failure-free. This means its failure set is empty. 

(fail(Pi) n del-i(Xi)(del(X 2 )(P 2 )))U 

(Pi n del-i(Xi)(fail(del(X2)(P2)))) = 0 (13) 

^ fail(Pi)ndel-i(Xi)(del(X2)(P2)) = 0 (14) 

Using Equation 01 Equation El can be transformed to: 

del(X 2 )(der^(X 2 )(fail(Pi))) n del" ^(Xi)(del(X 2 )(P 2 )) = 0 (15) 

Using Equation 01 Equation El becomes: 

del(X 2 )(del-i(X 2 )(fail(Pi))) n del(X 2 )(der i(Xi)(P 2 )) = 0 (16) 

From Equation 0 Equation can be transformed to: 

del(X 2 )(der\X 2 )(fail(Pi)) n der\Xi)(P 2 )) = 0 (17) 

Finally, from Equation El we get the following result: 

del-i(X2)(fail(Pi)) n del-i(Xi)(P2) = 0 (18) 

Now, suppose M 2 II hide(J'fi)(Mi) is failure-free. In a similar manner, we derive: 
der^(Xi)(fail(P 2 )) n del-i(X 2 )(Pi) = 0 (19) 



The union of Equation El and El is the failure set of Mi || M 2 . Since both 
Equation El and ^3 are empty, the failure set of Mi || M 2 is empty. | 

Calculationof P is an exponential problem. Lemma 01 shows that hide(P) 
(Te) tr ace{ahs{D) {N e)) ■ Therefore, from LemmaEl Mi || hide(X 2 )(M 2 ) ^ 

Ml II trace(abs(X2)(N'M2)) and hide(Xi)(Mi) || M2 ^ trace(abs(Xi)(A^Mi)) 
II M2. Using the above conclusions, we show another very important theorem. 

Theorem 2 . Let X\ and X2 be internal signal sets o/Mi and M2, respectively. 
If Ml II trace(abs(X2)(M2)) is failure-free and trace(abs(J'fi)(Mi)) || M2 is 
failure-free, then M = Mi || M2 is failure-free. 



4 Results and Conclusions 

We have incorporated our abstraction technique into our VHDL and HSE com- 
piler EB| frontend to the ATACS tool. The charts in Figure0show the comparative 
runtimes for verification using POSET timing ^ with and without abstraction 
on two different FIFO circuits. Only the first few stages are shown as larger 



Automatic Abstraction for Verification of Timed Circuits and Systems 



191 



FIFO’s cannot be verified without abstraction for the first FIFO. ATACS com- 
pletes 7 stages on the fiat design; but with abstraction, it completes 100 stages 
in about 6.5 minutes. The second example is a multiple stage controller for a 
self-timed FIFO that is very timing dependent nni. Without abstraction, only 

4 stages can be analyzed. With abstraction, we can analyze 100 stages in 23 
minutes. 

The last example is the STARI communication circuit described in detail in 
0. The STARI circuit is used to communicate between two synchronous systems 
that are operating at the same clock frequency, but are out-of-phase due to clock 
skew. The STARI circuit is composed of a number of FIFO stages built from 2 
C-elements and 1 NOR-gate per stage. There are two properties that need to be 
verified: (1) each data value output by the transmitter must be inserted into the 
FIFO before the next one is output and (2) a new data value must be output 
by the FIFO before each acknowledgment from the receiver m- To guarantee 
the second property, it is necessary to initialize the FIFO to be approximately 
half- full P|. In [^, the authors state that COSPAN which uses a region technique 
for timing verification ran out of memory attempting to verify a 3 stage gate- 
level version of STARI on a machine with 1 GB of memory. This paper goes 
on to describe an abstract model developed by hand for STARI for which they 
could verify 8 stages in 92.4 MB of memory and 1.67 hours. A fiat gate-level 
design for 10 stages can be verified in 124 MB and 20 minutes using POSET 
timing Q. Our automated abstraction method verifies a 14 stage STARI with 
a maximum memory usage of 23 MB of memory for a single stage in about 

5 minutes. Figure 0 shows the comparative runtimes for verification with and 
without abstraction on STARI using Bap, an enhanced version of the POSET 
timing analysis algorithm m- As shown in the chart. Bap can verify STARI 
for up to 12 stages with a memory usage of 277 MB. In the first few stages, the 
runtime for verification with abstraction is larger because abstraction itself takes 
time. When the complexity of the design grows, the runtime for fiat verification 
grows much faster. 

Since abstraction runtime grows polynomially in the size of the specification, 
the total runtime with abstraction grows in an approximately polynomial man- 
ner. This is substantially better than the exponential growth in the analysis of 
fiat designs. We have also found that verification with abstraction is not only 
several orders of magnitude faster than that for fiat designs, but also successful 
on several orders of magnitude more complex designs. 
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Abstract. We consider the randomized consensus protocol of Aspnes 
and Herlihy for achieving agreement among N asynchronous processes 
that communicate via read/write shared registers. The algorithm guaran- 
tees termination in the presence of stopping failures within polynomial 
expected time. Processes proceed through possibly unboundedly many 
rounds; at each round, they read the status of all other processes and at- 
tempt to agree. Each attempt involves a distributed random walk: when 
processes disagree, a shared coin-flipping protocol is used to decide their 
next preferred value. Achieving polynomial expected time depends on 
the probability that all processes draw the same value being above an 
appropriate bound. For the non-probabilistic part of the algorithm, we 
use the proof assistant Cadence SMV to prove validity and agreement for 
all N and for all rounds. The coin-flipping protocol is verified using the 
probabilistic model checker PRISM. For a finite number of processes (up 
to 10) we automatically calculate the minimum probability of the pro- 
cesses drawing the same value. The correctness of the full protocol follows 
from the separately proved properties. This is the hrst time a complex 
randomized distributed algorithm has been mechanically verihed. 



1 Introduction 

Randomization in the form of coin-flipping is a tool increasingly often used as a 
symmetry breaker in distributed algorithms, for example, to solve leader election 
or consensus problems. Such algorithms are inevitably difficult to analyse, and 
hence appropriate methods of automating their correctness proofs are called for. 
Furthermore, the use of random choices means that certain properties become 
probabilistic, and thus cannot be handled by conventional model checking tools. 
We consider the randomized consensus protocol due to Aspnes and Herlihy 
for achieving agreement among N asynchronous processes that communicate 
via read/write shared registers, which guarantees termination in the presence of 
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stopping failures in polynomial expected time. Processes proceed through pos- 
sibly unboundedly many rounds; at each round, they read the status of all other 
processes and attempt to agree. Each agreement attempt involves a distributed 
random walk (a Markov decision process, i.e. a combination of nondetermin- 
istic and probabilistic choices): when processes disagree, a shared coin-flipping 
protocol is used to decide their next preferred value. Achieving polynomial ex- 
pected time depends in an essential way on ensuring that the probability that all 
non-failed processes draw the same value being above an appropriate bound. 

One possible approach to analyse this algorithm is to verify it using a prob- 
abilistic model checker such as PRISM |S|. However, there are a number of 
problems with this approach. Firstly, the model is infinite. Secondly, even when 
we restrict to a finite model by fixing the number of processes and rounds, the re- 
sulting models are very large: 9 x 10® states for the simpler (exponential expected 
time) protocol with 3 processes and 4 rounds. Thirdly, many of the requirements 
are non-probabilistic, and can be discharged with a conventional model checker. 
Therefore, we adopt a different approach, introduced by Pogosyants, Segala and 
Lynch HS|: we separate the algorithm into two communicating components, one 
non-probabilistic (an asynchronous parallel composition of processes which pe- 
riodically request the outcome of a coin protocol) and the other probabilistic (a 
coin-flipping protocol shared by the processes). For the non-probabilistic part 
we use the proof assistant Cadence SMV0, which enables us to verify the non- 
probabilistic requirements for all N and for all rounds by applying the reasoning 
introduced in Hg. The shared coin-flipping protocol is verified using the proba- 
bilistic model checker PRISM. For a finite number of processes (up to 10) we are 
able to mechanically calculate the minimum probability of the processes draw- 
ing the same value, as opposed to a lower bound established analytically in P 
using random walk theory. The correctness of the full protocol (for the finite 
configurations mentioned above) follows from the separately proved properties. 

This is the first time a complex randomized distributed algorithm has been 
mechanically verified. Our proof structure is similar to the non-mechanical proof 
of ^3, but the proof techniques differ substantially. Although we did not And 
any errors, the techniques introduced here are applicable more generally, for 
example, to analyse leader election P] and Byzantine agreement 0. 

Related Work: The protocol discussed in this paper was originally proposed in 
P, then further analysed in [TS|. Distributed algorithms verified with Cadence 
SMV for any number of processes include the bakery algorithm H3. We know of 
two other probabilistic model checkers, ProbVerus m and El— MC^ P (neither 
of which supports nondeterminism that is essential here) . 



2 The Protocol 

Consensus problems arise in many distributed applications, for example, when 
it is necessary to agree whether to commit or abort a transaction in a distributed 

http : //www-cad. eecs .berkeley . edu/'kenmcmil/ smv 
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database. A distributed consensus protocol is an algorithm for ensuring that a 
collection of distributed processes, which start with some initial value supplied 
by their environment, eventually terminate agreeing on the same value. Typical 
requirements for such a protocol are: 

Validity: If a process decides on a value, then it is the initial value of a process. 
Agreement: Any two processes that decide must decide on the same value. 
Termination: All processes eventually decide. 

A number of solutions to the consensus problem exist (see for overview). 
There are several complications, due to the type of model (synchronous or asyn- 
chronous) and the type of failure tolerated by the algorithm. If the processes 
can exhibit stopping failures then the Termination requirement is too strong 
and must be replaced by wait-free termination: All initialized and non-failed 
processes eventually decide. Unfortunately, the fundamental impossibility result 
of [Z| demonstrates that there is no deterministic algorithm for achieving wait- 
free agreement in the asynchronous distributed model with communication via 
shared read/write variables even in the presence of one stopping failur^. One 
solution is to use randomization, which necessitates a weaker termination guar- 
antee: 

Probabilistic Wait-Pree Termination: With probability 1, all initialized and 
non-failed processes eventually decide. 

The algorithm we consider is due to Aspnes & Herlihy [Q. It is a complex algo- 
rithm using a sophisticated shared coin-flipping protocol. In addition to Valid- 
ity and Agreement, it guarantees Probabilistic wait-free termination with 
polynomial expected time for the asynchronous distributed model with commu- 
nication via shared read/write variables in the presence of stopping failures. 

The algorithm proceeds in rounds. Each process maintains two multiple-read 
single-write variables, recording its preferred value 1 or 2 (initially unknown, 
represented as 0), and its current round. The contents of the array start deter- 
mines the initial preferences. Additional storage is needed to record copies of the 
preferred value and round of all other processes as observed by a given process; 
we use arrays values and rounds for this purpose. Note that the round number is 
unbounded, and due to asynchrony the processes may be in different rounds at 
any point in time. In Cadence SMV we have the following variable declarations: 

#define N 10 /* number of processes (can be changed without affecting the proof) * / 
ordset PROC 1..A; /* set of process identifiers */ 
ordset NUM 0..; /* round numbers */ 

typedef PC {INITIAL, READ, CHECK, DECIDE, FAIL}- /* process phases 

act ■ PROC-, /* the scheduler’s choice of process */ 

start : array PROC of 1..2; /* start[i], initial preference of i * / 

pc : array PROC of PC; /* pc[i], the phase of process i */ 

value : array PROC of 0..2; /* value[i], current preference of i */ 

round : array PROC of NUM-, (* round[i], current round number of i */ 

values : array PROC of array PROC of 0..2; 



^ See m for solutions based on read/modify/ write variables, such as test-and-set. 
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/* values[i][j], j’s preference when last read by i */ 
rounds : array PROC of array PROC of NUM\ 

/* rounds[i\[j], j’s round number when last read by i */ 

count : array PROC of PROC\ /* auxiliary counter for the reading loop */ 

The processes begin with the INITIALisation phase, where the unknown value is 
replaced with the preferred value from the array start and the round number is 
set to 1. Then each process repeatedly executes the READing then CHECKing 
phase until agreement. READing consists of reading the preferred value and 
round of all processes into the arrays values and rounds. Process i terminates in 
the CHECKing phase if it is a leader (i.e. its round is greater than or equal to 
that of any process) and if all processes whose round trails i’s by at most 1 (i.e. 
are presumed not to have failed) agree. Otherwise, if all leaders agree, i updates 
its value to this preference, increments its round and returns to READing. In 
the remaining case, if i has a definite preference it “warns” that it may change 
by resetting it to 0 and returns to READing without changing its round number; 
if its preference is already 0, then i invokes a coin- flipping protocol to select a 
new value from {1,2} at random, increments its round number and returns to 
READing. 

In Cadence SMV a simplified protocol (we have removed the possibility of 
FAILure for clarity) can be described as follows, where the random choice of 
preference from {1,2} has been replaced by a nondeterministic assignment: 

switch (pc[aet]) { 

INITIAL : { 

next {value[aet]) := start[aet]\ 
next (round [act]) := round[act] + 1; 
next(pc[act|) := READ-, } 

READ : { 

next(pc[oct|) := [eount[act] = N) ? CHECK : READ-, 
next{rounds[aet][count[act]]) := round[count[act]]; 
next{values[aet][eount[aet]]) := value[eount[aet]]-, 
next {eount [act]) := {count[act] = N) ? count[act] : count[act] -f 1; } 

CHECK -. { 

if (decide [act]) { /* all who disagree trail by two and I am a leader */ 
next(pc[oct[) := DECIDE-, 

else if (agree [act] [1] | agree[act\(l\) { /* all leaders agree */ 
next(pc[act[) := READ-, 
next (eount [act]) := 1; 

next (value[act]) := agree[act][l] ? 1 : 2; /* set value to leaders’ preference */ 
next (round[act]) round[act] -f 1; } 
else { 

next(pc[act[) := READ-, 
next (eount [act]) := 1; 

next (value[act]) := (value[act] > 0) ? 0 : {1,2}; /* warn others or flip coin */ 
next(round[acf]) := (value[act] > 0) ? round[act] : round[act] -|- 1; } } 

} 

where the missing formulas decide and agree are defined below, assuming that 
j G obsi (process i has observed j) if either j < count[i] or pc[i] = CHECK: 
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agree is true if, according to z, all leaders whose values have been read by 
process i agree on value v, where v is either 1 or 2; formally: 

agree [i][r] /\j array ^agree\i][v\[j] 

array _agree[i][v\\j\ "= j € obsi {rounds[i\[j] > maxr[i\ — > values[i\[j] = v) 
maxr[i\ maxjgojs. rounds[i\[j] 

decide[i] is true if, according to i, all that disagree trail by 2 or more rounds 
and i is a leader; formally: 

decide[i] “= maxr[i] = round[i] A {ml^agree[i\[l\ V mf_ogree[i][2]) 
ml_agree[i\[v\ '= f\^ array jml_agree[i\[v\[j] 
array Ml-agree[i\[v\[j] "= j € obsi — > {rounds[i\\j\ > maxr[i\ — 1 — > values[i\[j\ = v) 

The above necessitates a variable, maxr, to store the maximum round number. 
The full protocol can be found at www . cs . bham . ac . uk/~dxp/prism/ consensus 
It remains to provide a coin-flipping protocol which returns a preference 1 or 
2, with a certain probability, whenever requested by a process in an execution. 
This could simply be a collection of N independent coins, one for each process, 
which deliver 1 or 2 with probability ^ (independent of the current round). In 
PP it is shown that such an approach yields exponential expected time to termi- 
nation. The polynomial expected time is guaranteed by a shared coin protocol, 
which implements a collective random walk parameterised by the number of pro- 
cesses N and a constant K > 1 (independent of A^). A new copy of this protocol 
is started for each round. The processes access a global shared counter, initially 
0. On entering the protocol, a process flips a coin, and, depending on the out- 
come, increments or decrements the shared counter. Since we are working in a 
distributed scenario, several processes may simultaneously want to flip a coin, 
which is modelled as a nondeterministic choice between probability distributions^ 
one for each coin flip. Note that several processes may be executing the protocol 
at the same time. Having flipped the coin, the process then reads the counter, 
say observing c. If c > ifiV it chooses I as its preferred value, and ii c < —KN 
it chooses 2. Otherwise, the process flips the coin again, and continues doing 
so until it observes that the counter has passed one of the barriers. The barri- 
ers ensure that the scheduler cannot influence the outcome of the protocol by 
suspending processes that are about to move the counter in a given direction. 

We denote by CF such a coin-flipping protocol and CFr the collection of 
protocols, one for each round number r. Model checking of the shared coin 
protocol is described in Section 0 

3 The Proof Structure 

Recall that to verify this protocol correct we need to establish the properties of 
Validity, Agreement and Probabilistic wait-free termination. The first 
two are independent of the actual values of probabilities. Therefore, we can 
verify these properties by conventional model checking methods, replacing the 
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probabilistic choices with nondeterministic ones. In Section 0 we describe how 
we verify Validity and Agreement using the methods introduced in OTMl 
for Cadence SMV. 

We are left to consider Probabilistic wait-free termination. This prop- 
erty depends on the probability values with which either 1 or 2 is drawn, and, 
in particular, on the probabilistic properties of the coin-flipping protocol. How- 
ever, there are several probabilistic progress properties which do not depend 
on any probabilistic assumptions. Similarly to the approach of uni we analyse 
such properties in the non-probabilistic variant of the model, except we use Ca- 
dence SMV, thus confining the probabilistic arguments to a limited section of 
the analysis. 

We now describe the outline of the proof based on |E|. First, we identify 
subsets of states of the protocol as follows: T>, the set of states in which all 
processes have decided; and Ty, for v G {1,2}, the set of states where there 
exists r G N and unique process i such that Fs preferred value is v, i has just 
entered round r, and i is the only leader. 

Non-probabilistic Arguments: There are a number of non-probabilistic argu- 
ments, see m- We state the two needed to explain the main idea of the proof: 

Invariant 1 From any state, the maximum round does not increase by more 
than 1 without reaching a state in IFq U iFi U T>. 

Invariant 2 From any state of iPy with maximum round r, if in round r all 
processes leave the protocol CFy agreeing on the value v, then the maximum 
round does not increase by more than 2 without reaching a state in T>. 

These properties are independent of the probabilities of the coin-flipping proto- 
col. So we can replace the random choices of CF with nondeterministic ones, 
except in round r where CFy must return value v for all processes. 

Probabilistic Arguments: There are two probabilistic properties, listed below. 

Cl For each fair execution of CFy that starts with a reachable state of CFy, 
with probability 1 all processes that enter CFy will eventually leave. 

C2 For each fair execution of CF^^ and each value v G {1,2}, the probability 
that all processes that enter CFy will eventually leave agreeing on the value 
V is at least p, where 0 < p < 1. 

Putting the Arguments Together: By Invariant 1 and Cl (since the coin-flipping 
protocol must return a value in order to continue), from any reachable state of 
the combined protocol, under any scheduling of nondeterminism, with probability 
1 one can always reach a state either in T>, T\ or such that the maximum 
round number increases by at most 1. Next by Invariant 2, Cl and C2, from a 
state in under any scheduling of nondeterminism, with probability at least p 
one can always reach a state in T> with the maximum round number increasing 
by at most 2. Therefore, from these two properties, starting from any reach- 
able state of the combined protocol, under any scheduling of nondeterminism. 
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with probability at least p one can always reach a state in T> (all processes have 
decided) with the maximum round number increasing by at most 3(=l+2). 

It then follows that the expected number of rounds until T> is reached is 0(i). 
Thus, using independent coins where p = -^ the expected number of rounds is 
0{2^). For the shared coin protocol, since p = - 3 ^, it is 0(1) (i.e. constant). 
This is because the round number does not increase while the processes perform 
the shared coin protocol. However, we must take account of the number of steps 
performed within the shared coin protocol; by random walk theory this yields 
expected time of {K + 1)^7V^ = 0{N'^) P, which is indeed polynomial. 

In the sequel we show how to use Cadence SMV and PRISM to mechanically 
verify the non-probabilistic and probabilistic arguments respectively. These have 
to be carried out at a low level, and therefore constitute the most tedious and 
error-prone part of the analysis. The remaining part of the proof, in which the 
separately verified arguments are put together, is not proved mechanically. It is 
sufficiently high level that it can be easily checked by hand. We believe that a 
fully mechanical analysis can be achieved with the help of a theorem prover. 



4 The Cadence SMV Proof 



Cadence SMV is a proof assistant which supports several compositional methods 
j r.a I ;tl 1 4j . These methods permit the verification of large, complex, systems by 
reducing the verification problem to small problems that can be solved automat- 
ically by model checking. Cadence SMV provides a variety of such techniques 
including induction, circular compositional reasoning, temporal case splitting and 
data type reduction. For example, data type reduction is used to reduce large or 
infinite types to small finite types, and temporal case splitting breaks the proof 
into cases based on the value of a given variable. Combining data type reduction 
and temporal case splitting can reduce a complex proof to checking only a small 
number of simple subcases, thus achieving significant space savings. 

There are two main challenges posed by the algorithm we consider: the round 
numbers are unbounded, leading to an infinite data type NUM , and we wish to 
prove the correctness for any number of processes, or, in other words, for all 
values of N. We achieve this by suitably combining data type reduction (ordset) 
with induction, circular compositional reasoning and temporal case splitting. 

We briefly explain the ordset data type reduction implemented in Cadence 
SMV Pl] with the help of the type NUM. For a given value r this reduction 
constructs an abstraction of this type shown in Figure^ where the only constant 
is 0. The only operations permitted on this type are: equality/inequality test- 
ing (between abstract values), equality/inequality test against a constant, and 
increment/decrement the value by 1. For example, the following are allowed: 
comparisons r > 0 and r = 0 (but not r = 1) and next(r) := r + 1. With these 
restrictions on the operations, the abstract representations as shown in Figure [D 
are isomorphic for all r € NUM . Therefore, it suffices to check a property for a 
single value of r. The reduction of the data type PROC is similar, except that 
there are two constants, 1 and N; see (H for more detail. 
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Fig. 1. Abstraction of NUM. 



We now illustrate the ordset reduction with a simple property, concern- 
ing the global maximum round, that is, the maximum round number over all 
processes. In Cadence SMV we can define this as follows: 

next(gmaxr) := next{round[act]) > gmaxr ? next (rownd [act]) : gmaxr-, 

However, since act ranges over PROC, the value of gmaxr depends on all in- 
stances round[i\ for i G N. We therefore introduce a history variable which 
records the value of round[act] and replaces the implicit dependence on N with 
a dependence on a single variable. We redefine gmaxr as follows: 

next(hist) := next{round[act])-, 

next{gmaxr) := next(hist) > gmaxr ? next(hist) : gmaxr-, 

We can now state that gmaxr is indeed the global maximum round number: 

forall (i in PROC) max[i] : assert G {round[i] < gmaxr)-, 

To prove this holds, we case split on the value of round[i] and suppose that max[i] 
holds at time t—1. Furthermore, by setting the variables that do not affect the 
satisfaction of max[i] to be free (allowing these variables to range over all the 
possible values of their types), we can improve the efficiency of model checking 
by a factor of 10. Though perhaps not important for this simple property, such 
improvements are crucial for more complex properties, as without freeing certain 
variables model checking quickly becomes infeasible. The proof is: 

forall (r in NUM) { 

subcase ma3;[i][r] of max[i] for round[i] = r; /* case split on round[i] * j 
using (max[{\), /* assume max[i] holds at time t — 1 * j 
agreej /free, decidej j free, start/ /free, value/ /free, /* free variables */ 
prove max[i] }; 

Through the ordset data type reduction SMV reduces this proof to checking 
maa;[f][r] for a single value of i (=2) and a single value of r (=1). 

The full proof of Validity, Agreement and Non-probabilistic progress 
is available at www . cs . bharni . ac . uk/~dxp/prism/ consensus The proof consists 
of approximately 50 lemmas, requiring at most 270 MB of memorjl§. Judicious 
choice of data reduction/freeing is important, as otherwise SMV may return false, 
but SMV allows one to inspect the cone of influence to identify the variables that 
are used in the proofs. 

The version of Cadence SMV we have used is not fully compatible with the release 
of 08.08.00. 



3 



202 



Marta Kwiatkowska, Gethin Norman, and Roberto Segala 



4.1 Proof of Validity 

We now outline the proof of Validity, which we verify by proving the contra- 
positive: if no process starts with value v then no process decides on v. For 
simplicity suppose v = 2. The hypothesis is that no process starts with value 2: 

forall (i in PROC) valid-assump[{\ : assert G -■( start[i\ = 2 ); 
which is assumed throughout the proof, and the conclusion is: 

forall (i in PROC) validity[i] : assert G(pc[i] = DECIDE -i{value[i] — 2)); 

The important step in proving validity is seeing that if all processes start with 
preference 1, then any process i past its INITIAL phase, i.e. whose round number 
is positive, has preferred value 1 and the predicate a 5 ree[t][l] holds. To prove 
validity we therefore first prove the stronger properties: 

forall (i in PROC) { 

validl[i\ : assert G [round[i] > 0 ^ value[i] = 1); 
valid2[i] : assert G [round[i\ > 0 ^ ajreefijfl]); } 

We prove validl [z] by case splitting on round [z] and assuming valid2 [z] holds 
at time t — 1. Also, since round[i] = 0 is a special case, we must add 0 to 
the abstraction of NUM (otherwise Cadence SMV returns false), i.e. NUM is 
abstracted to 0, {1, . . . , r — 1}, r, {r-|- 1, . . . }. The proof in Cadence SMV has the 
following form: 

forall (r in NUM) { 

subcase validl[i\[r\ of validl[i\ for round[i] = r; 

using valid ^assump[i], [valid2[{\) , NUM {0, r}, . . . , prove validl [z][r]; } 

To prove valid2[i], we have the additional complication of agree [z][l] being de- 
fined as the conjunction of an array {array -agree[i\[V\[j] for j G PROC), which 
again contains an implicit dependency on all values of the set PROC. Instead, 
we consider each element of the array separately. In particular, we first prove 
the auxiliary property valid3[i] elementwise, assuming validl holds, and again 
add 0 to the abstraction of NUM: 

forall (z in PROC) forall {j in PROC) { 

valid3[i\[j] : assert G ( round[i\ > 0 ^ array ^agree[i][l][j\ ); 
forall (r in NUM) { 

subcase validS [i][j][r] of valid3[i][j] for maxr[i\ = r; 

using valid ^assump[j], validl[j], NUM ^ {Q,r), . . . , prove valid3[i\[j][r\\ }} 

Next we use validS [z] [j] to prove valid2 [z] through a proof by contradiction: first 
consider the processes j such that array -agree[i][\][j] is false: 

forall (i in PROC) y[i] := { j : j in PROC, array -agree[i][l][j] }; 

Then we consider a particular j G y[i] when proving valid2[i] while using the 
fact that valid3[i][j] holds: 

forall {j in PROC) { 

subcase valid2[i\[j] of valid2[i] for y[i] = j; 
using valid3[i][j], . . . , prove valid2[i][j]-, } 

The contradiction then arises since, by valid3 [i][j], array -agree[i][l][y[i]] must be 
true. The apparent circularity between these properties is broken since validl 
assumes valid2 at time t — 1. 




Automated Verification of a Randomized Distributed Consensus Protocol 



203 



4.2 Proof of Agreement 

We now outline the proof of Invariant 6.3 of uni which is used to prove Agree- 
ment, the most difficult of the requirements. First we define new predicates 
fiU_maxr[i], array _fiU ^agree[i][v][j] and fill -agree[i][v] to be the same as the corre- 
sponding predicates maxr[i], array -agree[i][v\[j] agree[i\[v], except an incomplete 
observation of a process is “filled in” with the actual values of the unobserved 
processes. More formally: 

filljrounds[i]\j\ if j G obsi then rounds[i][j] else round[j] 

filLvalues[i][j] if j £ obsi then values[i][j] else value[j]. 

Invariant 6.3 of |p^. Let i he a proeess. Given a reaehahle state, let v = 
value[i]. If fill-maxr[i] = round[i], ml_agree[i][v] and fill -agree[i][v], then 

a. Vj agree[j][v] 

b. Vj round[j] > round[i] — > value[j] = v 

c. Vjgo6si {round[j] = round[i] — 1 A value[j] ^ v) ^ filGmaxr[j] < round[i]. 

We now describe our approach to proving Invariant 6.3. For simplicity, we have 
restricted our attention to when u = 1. To ease the notation we let: 

C[i] = {filLmaxr[i] = round[i]) A ml-agree[i][l] A filLagree[i][l] A {value[i] = 1). 

We first split Invariant 6.3 into separate parts corresponding to the conditions 
a, b and c. The main reason for this is that the validity of the different cases de- 
pends on different variables of the protocol. We are therefore able to “free” more 
variables when proving the cases separately, and hence improve the efficiency of 
the model checking. Formally, conditions a and b of Invariant 6.3 are given by: 

forall (i in PROC) forall (j in PROC) 

inv63a[i][j] : assert G ( C[i\ — > agree[ji][l] ); 

inv63b[i][j] : assert G ( C[i] — > {round[j] > round[i] — > value[j] — 1) ); 

Note that, when proving inv63a[i][j], ogree[j][I] is the conjunction of an array. 
We therefore use the same proof technique as outlined for valid2 [i] in Section PTTI 
that is, we first prove: 

forall {i in PROC) forall (j in PROC) forall {k in PROC) 
inv63ak[i][j][k] : assert G ( C[i] array -agree[j][l][k] ); 

We encounter a similar problem with the precondition, C[i\, since mPagree[i\[l\ 
and fill -agree[i][l] are conjunctions of arrays. In this case, we use a version of 
Lemma 6.12 of uni- Informally, this lemma states: if C\i] holds in the next state 
and the transition to reach this state does not involve process i changing the 
value of round[i] or value[i], then C[i] holds in the current state. More precisely, 
we have the following properties: 

forall (i in PROC) { 

lem612a[i] : assert G ( {-n(act = i) A X (C[i])) ^ (C"!*]) ); 

lem612b[i\ : assert G ( {act = i A {pc[i] = READ) A X (C[i])) — > (C[i]) ); 

lem612(\i] : assert G ( {act = i A X {{pc[i] = DECIDE) A C[i])) ^ {C[i]) ); } 
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When proving inv63ak[i][j][k] we case split on round[i] and assume inv63ak 
and mv63h[i][k] hold at time t—1 (Invariant 6.3c is not needed). Addi- 
tional assumptions include those of Lemma 6.12 given above. Also, since 
ml-agree[i] involves r— 1 where r is of type NUM, we abstract NUM to {0, . . . , r— 
2}, r — 1, r, . . . }. The actual proof in Cadence SMV has the following form: 

forall (r in NUM) { 

subcase irn;d5aA:[i][j][fc][r] of inv63ak[i][j][k] for round[i] = r; 

using (mud5afc[i][j'][fc]), {inv63b\i]\k]),lem612a\i], lem612h\i], lem612c[i], 

NUM ^ {r — 1 . .r}, ..., prove mt;d5aA:[i][j][fc][r]; } 



5 Verification with PRISM 

PRISM, a Probabilistic Symbolic Model Checker, is an experimental tool de- 
scribed in see www.cs.bhcun.ac.uk/~dxp/prism It is built in Java/C-| — h 
using the CUDD package which supports MTBDDs. The system descrip- 
tion language of the tool is a probabilistic variant of Reactive Modules. The 
specifications are given as formulas of the probabilistic temporal logic PCTL 
PRISM builds a symbolic representation of the model as an MTBDD and 
performs the analysis implementing the algorithms of m- It supports a sym- 
bolic engine based on MTBDDs as well as a sparse matrix engine. 

A summary of experimental results obtained from the shared coin-flipping 
protocol modelled and analysed using the MTBDD engine is included in the table 
below. Further details, including the description of the coin-flipping protocol, can 
be found at www.cs.bhcun.ac.uk/~dxp/prism/consensus. Both properties Cl 
and C2 are expressible in PCTL. Cl is a probability 1 property, and therefore 
admits efficient qualitative probabilistic analysis such as the probability- 
1 precomputation step [3, whereas C2, on the other hand, is quantitative, and 
requires calculating the minimum probability that, starting from the initial state 
of the coin-flipping protocol, all processes leave the protocol agreeing on a given 
value. Our analysis is mechanical, and demonstrates that the analytical lower 
bound obtained in is reasonably tight (the discrepancy is greater for 
smaller values of K, not included). 



N 


K 


Testates 


construction 
time (s): 


Cl 


C2 


time (s): 


time (s): 


probability: 


bound {K - 1)/2K: 


2 


64 


8,208 


1.108 


0.666 


3689 


0.493846 


0.4921875 


4 


32 


329,856 


2.796 


6.497 


212784 


0.494916 


0.484375 


8 


16 


437,194,752 


54.881 


59.668 


1085300 


0.47927 


0.46875 


10 


8 


10,017,067,008 


26.719 


139.535 


986424 


0.4463 


0.4375 



Fig. 2. Model Checking of the Coin-Flipping Protocol. 
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6 Conclusion 

In this paper we have for the first time mechanically verified a complex ran- 
domized distributed algorithm, thus replacing tedious proofs by hand of a large 
numbers of lemmas with manageable, re-usable, and efficient proofs with Ca- 
dence SMV and an automatic check of the probabilistic properties with PRISM. 
The verification of the protocol is fully mechanised at the low level, while some 
simple high-level arguments are carried out manually. A fully automated proof 
can be achieved by involving a theorem prover for the manual part of the analy- 
sis. We believe that the techniques introduced here are applicable more generally, 
for example, to analyse i™ . 
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Abstract. Recursive state machines (RSMs) enhance the power of ordi- 
nary state machines by allowing vertices to correspond either to ordinary 
states or to potentially recursive invocations of other state machines. 
RSMs can model the control flow in sequential imperative programs 
containing recnrsive procedure calls. They can be viewed as a visual 
notation extending Statecharts-like hierarchical state machines, where 
concurrency is disallowed but recursion is allowed. They are also related 
to various models of pushdown systems studied in the verification and 
program analysis communities. 

After introducing RSMs, we focus on whether state-space analysis can 
be performed efflciently for RSMs. We consider the two central problems 
for algorithmic analysis and model checking, namely, reachability (is a 
target state reachable from initial states) and cycle detection (is there 
a reachable cycle containing an accepting state). We show that both 
these problems can be solved in time 0{nd^) and space 0{nd), where 
n is the size of the recursive machine and 6 is the maximum, over all 
component state machines, of the minimum of the number of entries 
and the number of exits of each component. We also study the precise 
relationship between RSMs and closely related models. 



1 Introduction 

In traditional model checking, the model is a finite state machine whose vertices 
correspond to system states and whose edges correspond to system transitions. 
In this paper we consider the analysis of recursive state machines (RSMs), in 
which vertices can either be ordinary states or can correspond to invocations of 
other state machines in a potentially recursive manner. RSMs can model control 
flow in typical sequential imperative programming languages with recursive pro- 
cedure calls. Alternatively, RSMs can be viewed as a variant of visual notations 
for hierarchical state machines, such as Statecharts mu and UML |S|, where 
concurrency is disallowed but recursion is allowed. 

More precisely, a recursive state machine consists of a set of component 
machines. Each component has a set of nodes (atomic states) and boxes (each of 
which is mapped to a component), a well-defined interface consisting of entry and 
exit nodes, and edges connecting nodes/boxes. An edge entering a box models the 
invocation of the component associated with the box, and an edge leaving a box 
corresponds to a return from that component. Due to recursion, the underlying 
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global state-space is infinite and behaves like a pushdown system. While RSMs 
are closely related to pushdown systems, which are studied in verification and 
program analysis in many disguises wm . RSMs appear to be the appropriate 
definition for visual modeling and allow tigher analysis. 

We study the two most fundamental questions for model checking of safety 
and liveness properties, respectively: (1) reachability, given sets of initial and 
target nodes, is some target node reachable from an initial one, and (2) cycle 
detection: given sets of initial and target nodes, is there a cycle containing a 
target node reachable from an initial node. For cycle detection, there are two 
natural variants depending on whether or not one requires the recursion depth 
to be bounded in infinite computations. We show that all these problems can 
be solved in time O{n0^), where n is the size of the RSM, and 0 is a parameter 
depending only on the number of entries and exits in each component. The 
number of entry points correspond to the parameters passed to a component, 
while the number of exit points correspond to the values returned. More precisely, 
for each component Ai , let di be the minimum of the number of entries and the 
number of exits of that component. Then 9 — maxi(di). Thus, if every component 
has either a “small” number of entry points, or a “small” number of exit points, 
then 9 will be “small”. The space complexity of the algorithms is 0{n9). 

The first, and key, computational step in the analysis of RSMs involves deter- 
mining reachability relationships among entry and exit points of each component. 
We show how the information required for this computation can be encoded as 
recursive Datalog-like rules of a special form. To enable efficient analysis, our 
rules will capture forward reachability from entry points for components with a 
small number of entries, and backward reachability from exit points for the other 
components. The solution to the rules can then be reduced to alternating reacha- 
bility for AND-OR (game) graphs. In the second step of our algorithm, we reduce 
the problems of reachability and cycle detection with bounded/unbounded recur- 
sion depth to traditional graph-theoretic analysis on appropriately constructed 
graphs based on the information computed in the first step. Our algorithms for 
cycle detection lead immediately to algorithms for model checking for linear- 
time requirements expressed as LTL formulas or Biichi automata, via a product 
construction for Biichi automata with RSMs. 



Related Work. Our definition of recursive state machines naturally general- 
izes the definition of hierarchical state machines of p. For hierarchical state 
machines, the underlying state-space is guaranteed to be finite, but can be ex- 
ponential in the size of the original machine. Algorithms for analysis of hierarchi- 
cal state machines adaptations of traditional search algorithms to avoid 

searching the same component repeatedly, and have the same time complexity 
as the algorithms of this paper. However, the “bottom-up” algorithms used in 
P for hierarchical machines can not be applied to RSMs. 

RSMs are closely related to pushdown systems. Model checking of pushdown 
systems has been studied extensively for both linear- and branching-time require- 
ments [hyipks) . These algorithms are based on an automata-theoretic approach. 
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Each configuration is viewed as a string over stack symbols, and the reachable 
configurations are shown to be a regular set that can be computed by a fixpoint 
computation. Esparza et al jH] do a careful analysis of the time and space re- 
quirements for various problems including reachability and cycle detection. The 
resulting worst case complexity is cubic, and thus, matches our worst case when 
9 = 0(n). Their approach also leads, under more refined analysis, to the bound 
0{nk^) 0, where n is the size of the pushdown system and k is its number 
of control states. We will see that the number of control states of a pushdown 
system is related to the number of exit nodes in RSMs, but that by working with 
RSMs directly we can achieve better bounds in terms of 9. 

Ball and Rajamani consider the model of Boolean programs, which can be 
viewed as RSMs extended with boolean variables Pj. They have implemented 
a BDD-based symbolic model checker that solves the reachability problem for 
Boolean programs. The main technique is to compute the summary of the input- 
output relation of a procedure. This in turn is based on algorithms for interproce- 
dural dataflow analysis H2|, which are generally cubic. As described in Section 5, 
when translating Boolean programs to RSMs, one must pay the standard expo- 
nential price to account for different combinations of values of the variables, but 
the price of analysis need not be cubic in the expanded state-space by making a 
careful distinction between local, read-global, and write-global variables. 

In the context of this rich history of research, the current paper has four 
main contributions. First, while equivalent to pushdown systems and Boolean 
programs in theory, recursive state machines are a more direct, visual, state- 
based model of recursive control flow. Second, we give algorithms with time 
and space bounds of 0{n9^) and 0{n9), respectively, and thus our solution for 
analysis is more efficient than the generally cubic algorithms for related models, 
even when these were geared specifically to solve flow problems in control graphs 
of sequential programs. Third, our algorithmic technique for both reachability 
analysis and cycle detection, which combines a mutually dependent forward and 
backward reachability analyses using a natural Datalog formulation and AND- 
OR graph accessibility, along with the analysis of an augmented ordinary graph, 
is new and potentially useful for solving related problems in program analysis 
to mitigate similar cubic bottlenecks. We also anticipate that it is more suitable 
for on-the-fly model checking and early error detection than the prior automata- 
theoretic solutions for analysis of pushdown systems. Finally, using our RSM 
model one is able to, at no extra cost in complexity, distinguish between infinite 
accepting executions that require a “bounded call stack” or “unbounded call 
stack”. This distinction had not been considered in all previous papers. 

Note: Results similar to ours have been obtained independently, and submitted 
concurrently, by ^ on a model identical to RSMs. 
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2 Recursive State Machines 

Syntax. A recursive state machine (RSM) A over a finite alphabet S is given 
by a tuple {Ai, . . . ,Ak), where each component state machine Ai = (Ni U 
Bi, Yi, Eui, Exi, 5i) consists of the following pieces: 

— A set Ni of nodes and a (disjoint) set Bi of boxes. 

— A labeling Yi : Bi i— > {1, . . . , A:} that assigns to every box an index of one of 
the component machines, Ai, . . . , A^. 

— A set of entry nodes Eui C Ni, and a set of exit nodes Exi C Ni. 

— A transition relation Si, where transitions are of the form (u, a, v) where (1) 
the source u is either a node of Ni, or a pair {b,x), where 6 is a box in Bi 
and X is an exit node in Exj for j = Yi{b); (2) the label a is either e, a silent 
transition, or in E; and (3) the destination v is either a node in Ni or a pair 
(5, e), where 6 is a box in Bi and e is an entry node in Euj for j = Yi(b). 




Fig. 1. A Sample Recursive State Machine. 



We will use the term ports to refer collectively to the entry and exit nodes of a 
machine Ai, and will use the term vertices of Ai to refer to its nodes and the ports 
of its boxes that participate in some transition. That is, the transition relation Si 
is a set of labelled directed edges on the set Vi of vertices of the machine Ai. We 
let Ei be the set of underlying edges of Si, ignoring labels. Figure d illustrates 
the definition. The sample RSM has three components. The component Ai has 
4 nodes, of which ul and u2 are entry nodes and uA is the exit node, and two 
boxes, of which 61 is mapped to component A 2 and 62 is mapped to A3. The 
entry and exit nodes are the control interface of a component by which it can 
communicate with the other components. Intuitively, think of component state 
machines as procedures, and an edge entering a box at a given entry as invoking 
the procedure associated with the box with given argument values. Entry-nodes 
are analogous to arguments while exit-nodes model values returned. 



Semantics. To define the executions of RSMs, we first define the global states 
and transitions associated with an RSM. A (global) state of an RSM A = 
(Ai, . . . Afc) is a tuple (61 , ... , br, u) where 61 , . . . , 6^ are boxes and m is a node. 
Equivalently, a state can be viewed as a string, and the set Q of global states 
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of A is B*N, where B = UiBi and N = UiNi. Consider a state {bi, . . . ,br,u) 
such that bi G Bj. for 1 < t < r and u G Nj. Such a state is well- formed if 
Yj^{bi) = ji+i for 1 < z < r and Yj^{br) = j- A well-formed state of this form 
corresponds to the case when the control is inside the component Aj , which was 
entered via box br of component Aj^ (the box 6^-1 gives the context in which 
Aj^ was entered, and so on). Henceforth, we assume states to be well-formed. 

We define a (global) transition relation 5. Let s = ( 6 i, . . . , 5^, zz) be a state 
with u G Nj and br G Bm- Then, (s, ct, s') G (5 iff one of the following holds: 

1. (zz, cr, u') G Sj for a node u' of Aj, and s' = {bi, . . . , br, u'). 

2. (zz, cr, ( 6 ', e)) G 5j for a box b' of Aj, and s' = {b\,. . . , br, b', e). 

3. zz is an exit-node of Aj, {{br,u),a,u') G 6m for a node u' of Am, and s' = 

{bi, . . .,br-l,u'). 

4. u is an exit-node of Aj, {{br,u),a,{b' ,e)) G 6 m for a box b' of Am, and 
s' = {bi,...,br-i,b',e). 

Case 1 is when the control stays within the component Aj, case 2 is when a 
new component is entered via a box of Aj, case 3 is when the control exits Aj 
and returns back to Am, and case 4 is when the control exits Aj and enters a 
new component via a box of Am- The global states Q along with the transition 
relation 6 define an ordinary transition system, denoted Ta ■ 

We wish to consider recursive automata as generators of w-languages. For 
this, we augment RSMs with a designated set of initial nodes, and with Biichi 
acceptance conditions. A recursive Biichi automaton (RBA) over an alphabet 
S consists of an RSM A over S, together with a set Init C of initial 

nodes and a set F C ufL;^A^i of repeating (accepting) nodes. (If F is not given, 
by default we assume F = to associate a language L{A) with RSM A 

and its Init set). Given an RBA, {A, Init, F), we obtain an (infinite) global 
Biichi automaton Ba = {Ta, Init* , F*), where the initial states Init* are states 
(e) where e G Init, and where a state {b\, . . .br,v) is in F* if v is in F. For 
an infinite word w = wqWi . . . over E, a run tt of Ba over zc is a sequence 
So Si S 2 • • • of states Si and symbols G AU{e} such that (1) sq G Init*, 
(2) (si, ai, Si+i) G 6 for all z, and (3) the word w equals aoaia 2 • ■ • with all the 
e symbols removed. A run tt is accepting if for infinitely many i, Si G F* . 

We call a run tt bounded if there is an integer m such that for all z, the length 
of the tuple Si is bounded by m. It is unbounded otherwise. In other words, in 
a bounded (infinite) run the stack-length (number of boxes in context) always 
stays bounded. A word w G E'*' is (boundedly /unboundedly) accepted by the 
RBA A if there is an accepting (bounded/unbounded) run of Ba on w. Note, w 
is boundedly accepted iff for some s G F* there is a run tt on zzi for which Si = s 
infinitely often. This is not so for unbounded accepting runs. 

We let L{A), L/A) and Lu{A) denote the set of words accepted, boundedly 
accepted, and unboundedly accepted by A, respectively. Clearly, Lh(A)UL„(A) = 
L{A), but L/A) and Lu{A) need not be disjoint. Given RBA A, we will be 
interested in two central algorithmic problems: 
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1. Reachability: Given A, for nodes u and v of A, let m =k n denote that some 
global state {bi, . . .br,v), whose node is v, is reachable from the global state 
(u) in the global transition system Ta- Extending the notation, let Init =k v 
denote that for some u G Init, u =k u. Our goal in simple reachability 
analysis is to compute the set {u | Init =k u} of reachable vertices. 

2. Language emptiness: We want to determine if L{A), Lb{A) and Lu{A) are 
empty or not. We obtain thereby algorithms for model checking RSMs. 



Notation. We use the following notation. Let Vi be the number of vertices and 
6i the number of transitions (edges) of each component Ai, and let v = SiVi, 
e = SiCi be the total number of vertices and edges. The size |A| of a RSM 
A is the space needed to write down its components. Assuming, w.l.o.g., that 
each node and each box of each component is involved in at least one transition, 
V < 2e and the size of A is proportional to its number of edges e. The other 
parameter that enters in the complexity is 0, a bound on the number of entries 
or exits of the components. Let eni = \Eui\ and exi = \Exi\, be the number of 
entries and exits in the z’th component, Ai. Then 9 = maxjgj-j^^ min(eni, eXi). 
That is, every component has either no more than 9 entries or no more than 9 
exits. There may be some components of each kind; we call components of the 
first kind entry-bound and the others exit-bound. 



3 Algorithms for State-Space Analysis 

Given a recursive automaton, A, in this section we show how problems such 
as reachability and language emptiness can be solved in time 0(|A|6*^); more 
precisely, in time 0{e9 + and space 0{e + v9). For notational convenience, 
we will assume without loss of generality that all entry nodes of the machines 
have no incoming edges and all exit nodes have no outgoing edges. 

3.1 Reachability 

Given A, we wish to compute the set {u | Init =k u}. For clarity, we present our 
algorithm in two stages. First, we define a set of Datalog rules and construct an 
associated AND-OR graph Ga, which can be used to compute information about 
reachability within each component automaton. Next, we use this information to 
obtain an ordinary graph Ha, such that we can compute the set {u | Init =k u} 
by simple reachability analysis on Ha- 



Step 1: The Rules and the AND-OR Graph Construction. As a first 
step we will compute, for each component Ai, a predicate (relation) Ri{x,y). If 
Ai is entry-bound, then the variable x ranges over all entry nodes of Ai and y 
ranges over all vertices of Ai . If Ai is exit-bound, then x ranges over all vertices 
of Ai and y ranges over all exit nodes of Ai. The meaning of the predicate is 
defined as follows: Ri{x,y) holds iff there is a path in Ta from (x) to (y). 
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The predicates Ri{x,y) are determined by a series of simple recursive re- 
lationships which we will write in the style of Datalog rules m- Recall some 
terminology. An atom is a term P{t) where P is a predicate (relation) symbol 
and r is a tuple of appropriate arity consisting of variables and constants from 
appropriate domains. A ground atom has only constants. A Datalog rule has the 
form head <— body, where head is an atom and body is a conjunction of atoms. 
The meaning of a rule is that if for some instantiation a, mapping variables of a 
rule to constants, all the (instantiated) conjuncts in the body of the rule a{body) 
are true, then the instantiated head cr(head) must also be true. For readability, 
we deviate slightly from this notation and write the rules as “head <— body, under 
constraint C”’ , where body includes only recursive predicates, and nonrecursive 
constraints are in C. We now list the rules for the predicates Ri. We distinguish 
two cases depending on whether the component Ai has more entries or exits. 
Suppose first that Ai is entry-bound. Then, we have the following three rules. 
(Technically, there is one instance of rule 3 for each box b of Ai.) 

1. Ri{x,x) , X G Eui 

2. Ri{x,w) ^ Ri{x,u) , X G Eui, (u,w) G Ei 

3. Ri{x, (6, w)) ^ Ri{x, (6, u)) A Rj{u, w) , x G Eui, b G Bi, Yiih) = j, 

u G Euj , w G Exj . 

Rule 1 says every entry node x can reach itself. Rule 2 says if an entry x can 

reach vertex u which has an edge to vertex w, then x can reach w. Rule 3 says 
if entry x of Ai can reach an entry port (6, u) of a box b, mapped say to the j’th 
component Aj, and the entry u of Aj can reach its exit w, then x can reach the 
exit port (6, w) of box 6; we further restrict the domain to only apply this rule 
for ports of b that are vertices (i.e., (b,u), (b,w) are incident to some edges of 
Ai). Rules for exit-bound component machines Ai are similar. 

1. Ri{x,x) , X G Exi 

2. Ri{u,x) Ri{w,x) , X G Exi, (u,w) G Ei 

3. Ri{{b,u), x) ^ Ri{{b,w), x) A Rj{u,w) , x G Exi,b G Bi,Yi{b) = j, 

u G Euj, w G Exj. 

The Datalog program can be evaluated incrementally by initializing the rela- 
tions with all ground atoms corresponding to instantiations of heads of rules 
with empty body (i.e., the atoms Rt{x, x) for all entries/exits x of Ai), and then 
using the rules repeatedly to derive new ground atoms that are heads of instanti- 
ations of rules whose bodies contain only atoms that have been already derived. 
As we’ll see below, if implemented properly, the time complexity is bounded by 
the number of possible instantiated rules and the space is bounded by the num- 
ber of possible ground atoms. The number of possible ground atoms of the form 
Ri{x, y) is at most ViO, and thus the total number of ground atoms is at most v6. 
The number of instantiated rules of type 1 is bounded by the number of nodes, 
and the number of rules of type 2 is at most eO. The number of instantiated 
rules of type 3 is at most 

The evaluation of the Datalog program can be seen equivalently as the 
evaluation (reachability analysis) of a corresponding AND-OR graph Ga = 
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{V,E, Start). Recall that an AND-OR graph is a directed graph (V,E) whose 
vertices V = hv U Ra consist of a disjoint union of and vertices, Ra> and ^ ver- 
tices, hvj and a subset of vertices Start is given as the initial set. Reachability 
is defined inductively: a vertex p is reachable if: (a) p G Start, or (b) p is a 
V-vertex and 3p' such that (p',p) G E and such that p' is reachable, or (c) p is a 
A- vertex and for Vp' such that (p',p) G E, p' is reachable. It is well-known that 
reachability in AND-OR graphs can be computed in linear time (see, e.g., |5]). 

We can define from the rules an AND-OR graph Ga with one V-vertex for 
each ground atom Ri{x, y) and one A-vertex for each instantiated body of a rule 
with two conjuncts, i.e., rule of type 3. The set Start of initial vertices is the 
set of ground atoms resulting from the instantiations of rules 1 that have empty 
bodies. Each instantiated rule of type 2 and 3 introduces the following edges: 
For a rule of type 2 (one conjunct in the body) we have an edge from the (V- 
vertex corresponding to the ground) atom in the body of the rule to the atom in 
the head. For an instantiated rule of type 3, we have edges from the V-vertices 
corresponding to the ground atoms in the body to the A-vertex corresponding 
to the body, and from the A-vertex to the V-vertex corresponding to the head. 
It can be shown that the reachable V-vertices in the AND-OR graph correspond 
precisely to the ground atoms that are derived by the Datalog program. 

The AND-OR graph has 0{v6) V-vertices, 0{v9^) A-vertices and 0{e9 + v9^) 
edges and can be constructed in a straightforward way and evaluated in this 
amount of time. However, it is not necessary to construct the graph explicitly. 
Note that the A-vertices have only one outgoing edge, so there is no reason to 
store them: once a A-vertex is reached, it can be used to reach the successor V- 
vertex and there is no need to remember it any more. Indeed, evaluation methods 
for Datalog programs maintain only the relations of the program recording the 
tuples (ground atoms) that are derived. We describe now how to evaluate the 
program within the stated time and space bounds. 

Process the edges of the components Ai to compute the set of vertices and 
record the following information: If Ai is entry-bound (respectively, exit-bound) 
create the successor list (resp. predecessor list) for each vertex. For each box, 
create a list of its entries and exits that are vertices, i.e., have some incident 
edges. For each component Ai and each of its ports u create a list of all boxes b 
in all the machines of the RSM A that are mapped to Ai in which the port u of 
b has an incident edge (is a vertex). The reason for the last two data structures 
is that it is possible that many of the ports of the boxes have no incident edges, 
and we do not want to waste time looking at them, since our claimed complexity 
bounds charge only for ports that have incident edges. It is straightforward to 
compute the above information from a linear scan of the edges of the RSM A. 

Each predicate (relation) Ri can be stored using either a dense or a sparse 
representation. For example, a dense representation is a bit-array indexed by the 
domain (possible tuples) of the relation, i.e., Eui x Vi or Vi x Exi. Initially all the 
bits are 0, and they are turned to 1 as new tuples (ground atoms) are derived. 
We maintain a list S of tuples that have been derived but not processed. The 
processing order (e.g., FIFO or LIFO or any other) is not important. Initially, 
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we insert into S (and set their corresponding bits) all the ground atoms from 
rule 1, i.e., atoms of the form Ri{x,x) for all entries x of entry-bound machines 
Ai and exits of exit-bound machines Ai. In the iterative step, as long as S is not 
empty, we remove an atom Ri{x,y) from S and process it. Suppose that Ai is 
entry-bound (the exit-bound case is similar). Then we do the following. For every 
edge (y,z) e Ei out of y, we check A Ri{x,z) has been already derived (its bit is 
1) and if not, then we set its bit to 1 and insert Ri{x,z) into S'. If y is an entry 
node of a box b, i.e. y = (6, u), where say b is mapped to Aj, then for every exit 
vertex (b, iv) of b we check if Rj{u, w) holds; if it does and if Ri{x, {b, w)) has not 
been derived, we set its bit and insert Ri{x, (b,w)) into S. If y is an exit of Ri, 
then for every box b that is mapped to Ai and in which the corresponding port 
(6, y) is a vertex we do the following. Let Ak be the machine that contains the 
box b. If (5, x) is not a vertex of Ak nothing needs to be done. Otherwise, if Ak 
is entry-bound (respectively, exit-bound), we check for every entry (respectively, 
exit) z of Ak whether the corresponding rule 3 can be fired, that is, whether 
Rk{z, {b, x)) holds but Rk{z, (6, y)) does not (respectively, Rk{{b, y),z) holds but 
Rk{{b,x),z) does not). If so, we set the bit of Rk{z,{b,y)) (resp., Rk{{b,x), z)) 
and insert the atom into S. Correctness follows from the following lemma. 

Lemma 1. A tuple {x,y) is added to the predicate Ri iff Ri{x,y) is true in the 
given RSM A, i.e., (x) can reach (y) in the transition system Ta. 



Theorem 1. Given RSM A, all predicates Ri can be computed in time 0{\A\9‘^) 
and space 0{\A\9). (More precisely, time 0{e9 + v9‘^) and space 0{e + v9).) 



Step 2: The Augmented Graph Ha- Having computed the predicates Ri, 
for each component, we know the reachability among its entry and exit nodes. 
We need to determine the set of nodes reachable from the initial set I nit in a 
global manner. For this, we build an ordinary graph Ha as follows. The set of 
vertices of Ha is C = UCj, the set of vertices of all the components. The set 
of edges of Ha consists of all the edges of the components, and the following 
additional edges. For every box boi Ai, say b is mapped to Aj, include edges from 
the entry vertices (b,u) of b to the exit vertices (b,w) such that Rj{u,w) holds. 
Lastly, add an edge from each entry vertex (6, u) of a box to the corresponding 
entry node u of the component Aj to which b is mapped. The main claim about 
Ha is: 

Lemma 2. u ^ v in RSM A iff v is reachable from u in Ha- 

Thus, to compute {u | Init u}, all we need to do is a linear-time depth first 
search in Ha- Clearly, Ha has v vertices and e + v9 edges. Thus we have: 

Theorem 2. Given an RSM A, the set {u | Init u} of reachable nodes can 
be computed in time O(|A|0^) and space 0{\A\9)- 

In invariant verification we are given RSM A, and a set T of target nodes, 
and want to determine if Init v for some v € T. This problem can be solved 
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as above in the given complexity. Note that, unlike reachability in FSMs, this 
problem is PTIME-complete even for single-entry, non-recursive, RSMs p. 

For conceptual clarity, we have presented the reachability algorithm as a 
two-stage process. However, the two stages can be combined and carried out 
simultaneously, and this is what one would do in practice. In fact, we can do 
this on-the-fly, and have the reachability process drive the computation and 
trigger the rules; that is, we only derive tuples involving vertices only when they 
are reached by the search procedure. This is especially important if the RSM 
A is not given explicitly but has to be generated from an implicit description 
dynamically on-the-fly. We defer further elaboration to the full paper. 



3.2 Checking Language Emptiness 



Given an RBA A = ((Hi, . . . , Ak),Init, F), we wish to determine whether L{A), 
Lb{A), and Lu{A) are empty. Since Lb{A) U Lu(A) = L{A), it suffices to de- 
termine emptiness for Lb{A) and Lu{A). We need to check whether there are 
any bounded or unbounded accepting runs in Ba = {Ta, Init* , F*). Our algo- 
rithm below for emptiness testing treats edges labeled by e no differently than 
ordinary edges. This makes the algorithm report that L(A) is non-empty even 
when the only infinite accepting runs in A include an infinite suffix of e-labeled 
edges. We show in the next section how to overcome this. Our algorithm pro- 
ceeds in the same two stage fashion as our algorithm for reachability. Instead 
of computing predicates Ri{x, y), we compute a different predicate Zi(x, y) with 
the same domain Ern x Vi or Vi x Exi, depending on whether Ai is entry- or 
exit-bound. Zi is defined as follows: Zi{x, y) holds iff there is a path in Ba from 
(x) to (y) which passes through an accept state in F*. We can compute Zi’s by 
rules analogous to those for Ri’s. In fact, having previously computed the Ri's, 
we can use that information to greatly simplify the rules governing Zi's, so that 
the corresponding rules are linear and can be evaluated by doing reachability in 
an ordinary graph (instead of an AND-OR graph) . The rules for an entry-bound 
machine Ai are as follows. 

1. Zi{x,y) , Ri{x,y), dxid X OY y & F* , X & Eui,y &Vi 

2. Zi{x,w) ^ Zi{x,u) , for X G Erii, (u,w) G Ei 

3a. Zi{x, ( 6 , w)) ^ Zi{x, ( 6 , u)) , if Rj{u, w), x G Erii, b G Bi, Yi{b) = j, 



u G Erij , w G Exj 

36. Zi{x, (b,w)) ^ Zj{u,w)) , if i?*(a;, (6,u)), x G An*, 6 G Bi,Yi(b) = j, 

u G Erij , w G Exj 

The rules for exit-bound components Ai are similar. Let G'a be an ordinary 
graph whose vertices are the possible ground atoms Zi{x,y) and with edges 
( 61 ,^ 2 ) for each instantiated rule 62 ^ The set Start of initial vertices is 
the ground atoms from rule 1. Then the reachable vertices are precisely the set 
of true ground atoms Zi{x,y). has 0{v6) vertices and 0{eO -|- v6^) edges. 
Again we do not need to construct it explicitly, but can store only its vertices 
and generate its edges as needed from the rules. 



Lemma 3. All predicates Zi can be computed in time 0{\A\9^) and space 0{\A\9). 
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Having computed Z^’s, we can analyze the graph Ha for cycle detection. Let Fa 
be the set of edges of Ha of the form {{b,x), {b,y)), connecting an entry vertex 
to an exit vertex of a box b, mapped say to Aj, and for which Zj{x,y) holds. 
Let Fa be the set of edges of the form {{b,x),x) where (b,x) is a vertex and 
X is an entry of the component to which box b is mapped (i.e., the edges that 
correspond to recursive invocations). 

Lemma 4. The language L„(A) is nonempty iff there is a cycle in Ha reachable 
from some vertex in I nit, such that the cycle contains both: (1) an edge in Fa 
or a vertex in F, and (2) an edge in Fa- 

We need a modified version H'^ of Ha in order to determine emptiness of 
Lh{A) efficiently. The graph H'j^ is the same as Ha except the invocation edges 
in Fa are removed. Also, the set of initial vertices need to be modified: let En' 
denote the vertices en of Ha, where en is an entry node of some component in 

A, and en is reachable from some vertex in I nit in Ha- 

Lemma 5. LffA) is nonempty iff there is a cycle in reachable from some 
vertex in En' , such that the cycle contains an edge in Fa or a vertex in F. 

Both Ha and have v vertices and 0(e + vff) edges. We can check the 
conditions in the two lemmas in linear time in the graph size, using standard 
cycle detection algorithms. 

Theorem 3. Given RBA A, we can check emptiness of L{A), LffA) and L„(A) 
in time 0{\A\9‘^) and space 0{\A\9). 

3.3 Model Checking with Biichi Automata 

The input to the automata-based model checking problem consists of a RSM 
A over E and an ordinary Biichi automaton B over E. The model checking 
problem is to determine, whether the intersection L{A) H L{B) is empty, or 
whether Lb{A)(^L{B) (L„(A)nL(H)) is empty if we wish to restrict to bounded 
(unbounded) runs. Having given algorithms for determining emptiness of L(A), 
Lt{A), and L„(A) for RBAs, what model checking requires is a product construc- 
tion, which given a Biichi automaton B and RSM A, constructs an RBA that 
accepts the intersection of the languages of the two. Also, in the last section we 
ignored e’s, now we show how to disallow an infinite suffix of e’s in an accepting 
run. We modify the given Biichi automaton B to obtain B' = {Q'g,5'^, Ib, Fb) 
as follows. For every state q, we add an extra state q' . We add a transition from 
q to q' labeled with e, from q' to itself labeled with s, and for every transition 
{q, a, q") of B, add a transition (g', a, q"). B' accepts exactly the same words as 

B, except it allows a finite number of e’s to be interspersed between consecutive 
characters in a word from L{B). 

The product RBA A' = A® B oi A and B is defined as follows. A' has the 
same number of components as A. For every component Ai of A, the entry-nodes 
An' of A'^ are Eni xQb, and the exit-nodes Ex'^ of A' are Exi xQb- The nodes 
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of A' are Ni x Qb while the boxes S' are the same as Bi with the same 
label (that is, a box mapped to Aj is mapped to A'j). Transitions S[ within each 
A[ are defined as follows. Consider a transition {v,a,v') in 5i. Suppose v G Ni. 
Then for every transition (g, a, q') of S', A[ has a transition ((n, q), a, {v' , q')) if 
v' is a node, and a transition ((ri, g), cr, (6, (e, g'))) if v' = (b,e). The case when 
V = (6, x) is handled similarly. Repeating nodes of A' are nodes of the form (v, q) 
with g G Fb- The construction guarantees that L(A 0 B) = L(A) n L(B) (and 
Lb{A0B) = Lb{A)r\L{B) and Lu{A®B) = L„(A) n L(S)). Analyzing the cost 
of the product, we get the following theorem: 

Theorem 4. Let A he an RSM of size n with 9 as the maximum of minimum of 
entry-nodes and exit-nodes per component, and let B be a Biichi automaton of 
size m with a states. Then, checking emptiness of L(A)n L(B) and o/L{,_„(A)n 
L{B), can he solved in time 0{n ■ m ■ of ■ 9^) and space 0{n ■ m ■ a • 9). 

4 Relation to Pushdown Systems 

The relation between recursive automata and pushdown automata is fairly tight. 
Consider a pushdown system (PDS) given by S = {Qp,F,A) over an alphabet 

5 consisting of a set of control states Qp, a stack alphabet F, and a transition 
relation A C (Qp x F) x S x {Qp x {push{F), swap{F),pop}). That is, based 
on the control state and the symbol on top of the stack, the machine can update 
the control state, and either push a new symbol, swap the top-of-the-stack with 
a new symbol, or pop the stack. When P is augmented with a Biichi acceptance 
condition, the w-language of the pushdown machine, L{P), can then be defined in 
a natural way. Given a PDS P (or RBA A), we can build a recursive automaton A 
(PDS P, respectively) such that L{P) = L{A), and \A\ G 0(|P|) (|P| G 0(|A|), 
respectively). Translating from P to A, A has one component with number of 
exits (and hence 9) bounded by \Qp\. Translating from A to P, the number of 
control states of P is bounded by the maximum number of exit points in A, not 
by 9. Both translations preserve boundedness. We have to omit details. For the 
detailed construction, please see the full paper. 

5 Discussion 

Efficiency and Context-Free Reachability: Given a recursive automaton 
of size n with 9 maximum entry/exit-nodes per component, our reachability al- 
gorithm takes time 0{n ■ 9'^) and space 0{n9). It is unlikely that our complexity 
can be substantially improved. Consider the standard parsing problem of test- 
ing CFL-membership: for a fixed context-free grammar G, and given a string 
w of length n, we wish to determine if G can generate w. The classic C-K-Y 
algorithm for this problem requires O(n^) time. Using fast matrix multiplica- 
tion, Valiant m was able to slightly improve the asymptotic bound, but his 
algorithm is highly impractical. A related problem is CFL-reachability (dam), 
where for a fixed grammar G, we are given a directed, edge-labeled, graph F[, 
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having size n, with designated source and target nodes s and t, and we wish to 
determine whether s can reach t in H via a path labeled by a string in L(G). 
CFL-membership is the special case of this problem where H is just a simple 
path labeled by w. Unlike CFL-membership, CFL-reachability is P-complete, 
and the best known algorithms require I7(n^) time (uni). Using the close corre- 
spondence between recursive automata and context-free grammars, a grammar 
G can be translated to a recursive automaton Aq- To test CFL-reachability, we 
can take the product of Aq with H, and check for reachability. The product 
has size 0(n), with 0(n) entry-nodes per component, and 0{n) exit-nodes per 
component. Thus, since our reachability algorithm runs in time 0{rA) in this 
case, better bounds on reachability for recursive automata would lead to better 
than cubic bounds for parsing a string and for CFL-reachability. 



Extended Recursive Automata: In presence of variables, our algorithms 
can be adopted in a natural way by augmenting nodes with the values of the 
variables. Suppose we have an extended recursive automaton A with boolean 
variables (similar observations apply to more general finite domains), and the 
edges have guards and assignments that read/write these variables. Suppose each 
component refers to at most k variables (local or global), and that it has at most 
either d input variables or d output variables (i.e., global variables that it reads or 
writes, or parameters passed to and from it). Then, we can construct a recursive 
automaton of size at most 2^ • |A| with the same number of components. The 
derived automaton has 9 = 2‘^. Thus, reachability problems for such an extended 
recursive automaton can be solved in time 0(2^+^'^ -1^1). Note that such extended 
recursive automata are basically the same as the boolean programs of 0. 



Coucurreucy: We have considered only sequential recursive automata. Recur- 
sive automata define context-free languages. Consequently, it is easy to establish 
that typical reachability analysis problems for a system of communicating re- 
cursive automata are undecidable. Our algorithms can however be used in the 
case when only one of the processes is a recursive automaton and the rest are 
ordinary state machines. To analyze a system with two recursive processes Mi 
and M 2 , one can possibly use abstraction and assume-guarantee reasoning: the 
user constructs finite-state abstractions Pi and P 2 of Mi and M 2 , respectively, 
and we algorithmically verify that (1) the system with Pi and P 2 satisfies the 
correctness requirement, (2) the system with Mi and P 2 is a refinement of Pi, 
and (3) the system with Pi and M 2 is a refinement of P 2 - 
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Abstract. The paper presents a method, called the method of verifi- 
cation by invisible invariants, for the automatic verification of a large 
class of parameterized systems. The method is based on the automatic 
calculation of candidate inductive assertions and checking for their induc- 
tiveness, using symbolic model-checking techniques for both tasks. First, 
we show how to use model-checking techniques over finite (and small) 
instances of the parameterized system in order to derive candidates for 
invariant assertions. Next, we show that the premises of the standard de- 
ductive INV rule for proving invariance properties can be automatically 
resolved by finite-state (BDD-based) methods with no need for interactive 
theorem proving. Combining the automatic computation of invariants 
with the automatic resolution of the VCs (verification conditions) yields 
a (necessarily) incomplete but fully automatic sound method for verify- 
ing large classes of parameterized systems. The generated invariants can 
be transferred to the VC- validation phase without ever been examined 
by the user, which explains why we refer to them as “invisible” . The ef- 
ficacy of the method is demonstrated by automatic verihcation of diverse 
parameterized systems in a fully automatic and efficient manner. 



1 Introduction 

The problem of uniform verification of parameterized systems is one of the 
most challenging problems in verification today. Given a parameterized system 
S{N) : P[l]|| • • • |1P[A^] and a property p, uniform verification attempts to verify 
S{N) 1= p for every N > 1. Model checking is an excellent tool for debugging 
parameterized systems because, if the system fails to satisfy p, this failure can 
be observed for a specific (and usually small) value of N. However, once all 
the observable bugs have been removed, there remains the question whether the 
system is correct for all A^ >1. 

One method which can always be applied to verify parameterized systems 
is deductive verification. To verify that a parameterized system satisfies the in- 
variance property np, we may use rule iNV presented in Fig. El IMFilF)al . The 
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system to be verified is assumed to a transition system, with an assertion 0 de- 
scribing the initial states, and a state transition relation p relating the values of 
(unprimed) variables in a state to the (primed) values of the variables in its suc- 
cessor. Premise II claims that the initial state of the system satisfies (p. Premise 



12. p A p p' 

13. p p 
DP 



Fig. 1. The Invariance Rule INV. 



12 claims that p remains invariant under p. An assertion p satisfying premises 
II and 12 is called inductive. Excluding the rare cases in which the property p 
is already inductive, the deductive verifier has to perform the following tasks: 

1. Divine (invent) the auxiliary assertion p. 

2. Establish the validity of premises 11-13. 

Performing interactive first-order verification of implications such as the premises 
of rule INV for any non-trivial system is never an easy task. Neither is it a 
one-time task, since the process of developing the auxiliary invariants requires 
iterative verification trials, where failed efforts lead to correction of the previous 
candidate assertion into a new candidate. 

In this paper we show that, for a wide class of parameterized systems, both 
of these tasks can be automated and performed directly by an appropriately 
enhanced model checker. The proposed method, called verification by invisible 
invariants , is based on the following idea: We start by computing the set of all 
reachable states of S{N) for a sufficiently large N. We then project this set of 
states on one of the processes, say P[l]. Under the assumption that the system is 
sufficiently symmetric, we conclude that whatever is true of P[l] will be true of 
all other processes. Thus, we abstract the characterization of all reachable states 
of process P[l], denoted into a generic tp{j) and propose the assertion 

p = \/j : if{j) as a candidate for the inductive assertion which can be used within 
rule INV. To check that the candidate assertion p is inductive and also implies 
the property p, we establish a small-model property which enables checking the 
premises of rule iNV over S{Nq) for a specific Nq determined by the size of the 
local state space of a process in the system. The two tasks of calculating the 
candidate assertion p and checking that it satisfies the premises of rule iNV are 
performed automatically with no user interaction. This leads to the fact that 
the user never sees, or has to understand, the automatically produced inductive 
assertion. This explains the name of verification by invisible invariants . 

The method of invisible invariants was first presented in pRZQlj . where 
it was used to verify a non-trivial cache protocol proposed by Steve German 
EEOni. The presentation in |PI!,Z01| allowed the method to be used only for a 
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very restricted class of systems. The main limitations were that the only pred- 
icates allowed in this class were equality comparisons between parameterized 
types, and the only arrays were of type [I-.IV] i— > bool. In this paper, we extend 
the applicability of the method in several dimensions as follows: 

• Allowing inequality comparisons of the form u < v between parameterized 
types and operations such as it -|- 1 and it 0 1 (incrementation modulo N) . 

• Allowing several parameterized types and arrays that map one parameterized 
type to another. 

These extensions significantly broaden the scope of applicability of the method, 
allowing us to deal with diverse examples such as various cache protocols, a 
3-stage pipeline, Szymanski’s algorithm for mutual exclusion, a token-ring algo- 
rithm, a restricted form of the Bakery algorithm, and an Af-process version of 
Peterson’s algorithm for mutual exclusion, all in a fully automatic and efficient 
manner. 



Related Work. The problem of uniform verification of parameterized systems 
is, in general, undecidable |AK8B| . There are two possible remedies to this situ- 
ation: either we should look for restricted families of parameterized systems for 
which the problem becomes decidable, or devise methods which are sound but, 
necessarily incomplete, and hope that the system of interest will yield to one of 
these methods. 

Among the representatives of the first approach we can count the work of Ger- 
man and Sistla CT which assumes a parameterized system where processes 
communicate synchronously, and shows how to verify single-index properties. 
Similarly, Emerson and Namjoshi provided a decision procedure for proving a 
restricted set of properties on ring algortihms |ENflfij . and proved a P SPACE 
complete algorithm for verification of synchronously communicating processes 
jENhfij . Many of these methods fail when we move to asynchronous systems 
where processes communicate by shared variables. Perhaps the most advanced 
of this approach is the paper iRKin which considers a general parameterized 
system allowing several different classes of processes. However, this work provides 
separate algorithms for the cases that the guards are either all disjunctive or all 
conjunctive. A protocol such as the cache example we consider in IPR/()1I which 
contains some disjunctive and some conjunctive guards, cannot be handled by 
the methods of 



The sound but incomplete methods include methods based on explicit in- 
duction (1^^) network invariants, which can be viewed as implicit induction 
l |KM95j . [WTAflj . [HT;P 92] . [T;HP,97j ). methods that can be viewed as abstraction 
and approximation of network invariants ([RGG86j. |SG89j. |GG.T T^j, p^ POOj). 
and other methods that can be viewed as based on abstraction (pnnnj). The 
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use structural induction based on the notion of 
a network invariant but significantly enhance its range of applicability by using 
a generalization of the data-independence approach which provides a powerful 
abstraction capability, allowing it to handle network with parameterized topolo- 
gies. Most of these methods require the user to provide auxiliary constructs, such 
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as a network invariant or an abstraction mapping. Other attempts to verify pa- 
rameterized protocols such as Burn’s protocol and Szymanski’s algorithm 

|0/98iM AB+94j relied on abstraction functions or lemmas provided by the user. 
The work in irsSZI deals with the verification of safety properties of parameter- 
ized networks by abstracting the behavior of the system. PVS (' [SOB 931 1 is used 
to discharge the generated VCs. 

Among the automatic incomplete approaches, we should mention the meth- 



ISIMIMiBa IMBIBiaaCI] 



where 



ods relying on “regular model-checking” 
a class of systems which include our bounded-data systems as a special case is 
analyzed representing linear configurations of processes as a word in a regular 
language. Unfortunately, many of the systems analyzed by this method cause 
the analysis procedure to diverge and special acceleration procedures have to be 
applied which, again, requires user ingenuity and intervention. 

The works in study symmetry reduction in order 

to deal with state explosion. The work in detects symmetries by inspection 

of the system description. Closer in spirit to our work is the work of McMillan 
on compositional model-checking (e.g. |McM98hj l. which combines automatic 
abstraction with finite-instantiation due to symmetry. 



2 The Systems We Consider 

Let typeg denote the set of boolean and fixed (unparameterized) finite-range 
basic types which, for simplicity, we often denote as bool. Let typej^, . . . , type^ 
be a set of basic parameterized types, where each type^ includes integers in the 
range [l..fVi] for some Ni G N+. The systems we study include variables that are 
either type, variables, for some i G {0, . . . ,to}, or arrays of the type type^ i-^- 
type^ for i > 0,_) > 0. For a system that includes types type^^, . . . ,typej., we 
refer to 7Vi,...,7Vfe as the system’s parameters. Systems are distinguished by 
their signatures , which determine the types of variables allowed, as well as the 
assertions allowed in the transition relation and the initial condition. Whenever 
the signature of a system includes the type type^ i-^ assume by default 

that it also includes the types type^ and type^ . 

Atomic formulae may compare two expressions of the same type, where the 
only allowed expressions are a variable y or an array reference z[y\. Thus, if y and 
y are type, variables, then y < y is an atomic formula, and so are z[y] < z[y], 
X < z[y]^ and z[y] < x for an array z: type^ i-^ typsj and x: type^ . 

Formulae., used in the transition relation and the initial condition, are ob- 
tained from the atomic formulae by closing them under negation, disjunction, 
and existential quantifiers, for appropriately typed quantifiers. 

A bounded-data discrete system (BDS) S = {V, 0, p) consists of 

• U-A set of system variables, as described above. A state of the system S 
provides a type-consistent interpretation of the system variables V. For a 
state s and a system variable v G V, we denote by s)?;] the value assigned to 
V by the state s. Let S denote the set of states over V. 

• 0(U)-The initial condition: A formula characterizing the initial states. 
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• p(y,y')-The transition relation: A formula, relating the values V of the 
variables in state s S A to the values V in an S'-successor state s' G A. 



For all the systems we consider here, we assume that the transition relation can 
be written as 

typoi typej, type^ type^, 



where ^ is a well-typed quantifier-free formula. It follows that every BDS 
is associated with Hi,. , Hk and Ti, . . . , Tk- 

Note that 0 and p are restricted to “formulae” defined in the previous section. 
Since in this paper we only consider the verification of invariance properties, we 
omitted from the definition of a BDS the components that relate to fairness. 
When we will work on the extension of these methods to liveness, we will add 
the relevant fairness components. 

A computation of the BDS S = {V, 0, p) is an infinite sequence of states 
cr : So, si, S 2 , ..., satisfying the requirements: 



— Initiality — sq is initial, i.e., sq ^ 0. 

— Consecution — For each £ = 0,1,..., the state s^+i is a S'-successor of se. 
That is, {si,Si+i) ^ p{V,V') where, for each v GV,we interpret v as s^[u] 
and v' as s^+i [■(;]. 

Mainly, we consider systems with signature (type^^ i— > typeo,typej^ i— > type 2 ). 
This signature admits arrays which are subscripted by type^-elements and range 
over either typep or type 2 . We name the variables in such a system as follows: 
xi, . . . ,Xa are typeg variables, yi, . . . ,yb are type^ variables, zi, ... ,Zc are arrays 
of type typej^ i— > typeg , u\, . . . ,Ud are type 2 variables, and wi, ... ,We are arrays 
of type typei i-^ type 2 . 

We keep these naming convention for systems with simpler signatures. Thus, 
a system with no type 2 variables will have only x-, y-, or z-variables. We assume 
that the description of each system contains a z-variable tt that includes the 
location of each process. 



3 The Method of Invisible Invariants 

Our goal is to provide an automated procedure to generate proofs according to 
INV. While in general the problem is undecidable jAK86j . we offer a heuristic 
that had proven successful in many cases for the systems we study, where the 
strengthening assertions are of the form Vzi , . . ■ ,ij : ip{i) where ii, . . . ,ij are all 
mutually distinct typed variables, and 'ip{i) is a quantifier-free formula. We elabo- 
rate the method for the case of systems with signature (type^ i— *■ typeg, type^^ i-^- 
IyP® 2 ) defined in Section |21 Thus, we are seeking an assertion of the type 
Vf}, . . . , , if , . . . , : tp{i'^,i‘^) where for if , . . . , are all mutually distinct 

type^ variables for £=1,2, and ip{i^, i^) is a quantifier-free formula. In the next 
sections we obtain (small) bounds for the parameters of this family of systems. 
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such that it suffices to prove the premises of INV on systems whose parameters 
are bounded by those bounds. This offers a decision procedure for the premises 
of rule INV, which greatly simplifies the process of deductive verification. Yet, 
it still leaves open the task of inventing the strengthening assertion ip, which 
may become quite complex for all but the simplest systems. In this section we 
propose a heuristic for an algorithmic construction of an inductive assertion for 
a given BDS. In particular, we provide an algorithm to construct an inductive 
assertion of the form we are seeking for a two-parameter systems S{Ni , N2), 
where Ni and Y® are the bounds computed for the system. 

1 . Let reach be the assertion characterizing all the reachable states of system 
S{Ni,N2). Since S'(Y°, Y2) is finite-state, reach can be computed by stan- 
dard model-checking techniques. 

2 . Let V'/i,/2 be the assertion obtained from reach by projecting away all the 
references to typej^ values other than 1 , . . . , Ji, and type2 values other than 

3. Let be the assertion obtained from by abstraction, which 

involves the following transformations: For every j = 1, . . . , Ji and and k = 
1, . . . ,I 2 , replace any reference to Zr[j] by a reference to any reference 

to Wr[j] = k (resp. Wr[j] yf k) by areference to Wr[ij] = (resp. Wr[i^] yf i1), 
any sub- formula of the form Ur = j, j ^ Ii by the formula yr = any sub- 
formula of the form = v for v > /i by the formula AyLi Vr y^ *], any 
sub- formula of the form Ur = k, k < I 2 , hy the formula Ur = i\, and any 
sub- formula of the form Ur = v ior v > I 2 by the formula AfcLi 'Ur A 

4. Let Ai<ii<,,,<ii 

j- 1 i z 

5 . Check that <p is inductive over S'(YA Y2). 

6 . Check that > p is valid. 

If tests ( 5 ) and ( 6 ) both yield a positive result, then the property p has been 
verified. The procedure described here is a generalization of a similar procedure 
in jPR.Z 01 ! . 

4 Obtaining the Bounds 

In this section we obtain (small) bounds for the parameters of various systems 
according to their signatures, and show that it suffices to prove the premises of 
INV on systems whose parameters are within these bounds. We first present our 
main claim, which establishes the bounds for the most general system. 

Consider a BDS S'(Yi, Y2) with signature (typej^ i-^- bool, type^ 1 — > type2) 
to which we wish to apply proof rule iNV with the assertions p and p having each 
the form Vj}, • • • Aij • ■ • ) • ' 0 (A, A), where every (resp. i 1 ) is a type^ 

(resp. type2) variable, and i/>(A,i^) is a quantifier free formula. The transition 
relation of the system is described by equation iP) with k = 2 . 

Consider a state s of the system S{Ni,N2). The size of s is (^1,^2). We say 
that (Yi, Y2) is smaller than (N[,N^), and denote it by (Yi, Y2) A (.^1,-^2) if 
Yi < Y{ and Y2 < Y^ 
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Lemma 1 . The premises of rule INV are valid over S{Ni, N2) for all (iVi, N2) 'ti 
( 2 , 2 ) iff they are valid over S{Ni,N2) for all (iVi, A^2) ^ (-^i)-^2) where Nf = 
6 + /i + Hi and N2 = d 12 H2 + e(6 + /i + H\ ) . 

Proof. The most complex verification condition in rule iNV is premise (12) which 
can be written as: (Vj : i/^(j)) A {3hWt : R{h, t)) ^\/i : '4)'{i) To prove the claim, 
it suffices to show that if the formula 

(Vj : '(/’(/)) A {yt:R{h,i)) A (2) 

is satisfiable over a state of size (Ni,N2) ^ (2,2), it is also satisfiable over a 
state of size (01,02) A (A^°, A^^)- 

Let s be a state of size {Ni,N 2 ) A (2,2) which satisfies assertion ((21). Let 
Vail C be the set of (pairwise) distinct values the state s assigns to the 

variables = {h\, . . . , hl^,i\, . . . ,i]^,yi, . . . , yi,}. Let oi = | Vah\; obviously, 

oi < iLi + /i + 6 = Nf and oi < Ni. Similarly, let Val 2 C [l..iV 2 ] be the set of 
(pairwise) distinct values the state s assigns to the variables 
Kug = {hj, ■ ■ ■ ,hl^,ij, . . . ,i'^^^,ui, . . . ,Ud} U {we[k] ■.£=l,...,e, ke Vah} 

Let 02 = |VaZ2|; obviously, 02 < H2 + l2 + d + e{Hi+Ii + b) = 02 < -/V2. 

For every £ = 1,2, assume the distinct type^ values are sorted uf < U2 < 

• • • < Let be a permutation on such that for every j = 1, , ai, 

Hf^{vj) = j. The two permutations. Hi, and 772, help construct a new state 
where the range of values assigned to the variables in Vf^^g (resp. Vf^^g) is reduced 
from [I..-/V1] (resp. [l..A^2]) to [I..01] (resp. [1..02]). 

Consider now a state s” of size (01,02), where: Xr = Xr for every r G [l..a], 
yr = Hf^{yr) for every r G [1..6], Zr[h] = Zr[Hi{k)\ for every r G [l..c] and 
k G [1..0i], Ur = Hf^{ur) for every r G [l..d], and Wr[k] = nf"^{wr[Hi{k)]) for 
every r G [l..e] and k G [1..0i]. 

It is easy to establish, by induction on the structure of the quantifier-free 
formulae ip and 7?, that the evaluation of formula 0 over s yields the same 
truth values as the evaluation of formula (| 2 I) over s. Consequently, s' is a state 
of size (oi, 02) that satisfies formula (EJ). □ 



The Class (typei i— > bool). This is the class of systems which have boolean 
and other finite-domain parameterized arrays. The algorithms belonging to this 
class are MUX-SEM (mutual exclusion by semaphores), a 3-stages pipeline |HI )h4j . 
flVlcMHHaj. Steve German’s cache [Cer()0|FH,2Xnj. and the Illinois’ Cache Algo- 



rithm 



laKfilimnii] 



], all studied in [RH.Z_(ll j . In addition, it includes Szymanski’s 
mutual exclusion algorithm and token ring algorithms. 

This class extends the class of systems considered in [PRZOI j. which only 
allowed comparison for equality between two yr variables, by allowing tests for 
inequalities. Since there are no type2 variables in the system, an immediate 
consequence of Lemma His: 



Corollary 1. Let S{N) be a parameterized system of signature (type^^ 1 — > bool) 
to which we wish to apply proof rule INV with the assertions ip and p having each 
the form Vii , . . . ,i, : if{ii, . . . ,ij). Then, the premises of rule iNV are valid over 
S{N) for all N > 1 iff they are valid over S{N) for all N, 1 < N < TVq, where 
No = b + I+H. 
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In Fig .0 we present a mutual exclusion algorithm due to B. Szymanski |Szy 88 | , 
which uses inequality comparisons between process indices. In the system, 6 = 0 
and H = 2 . To apply this claim for system szymanski, where the property 



in N : integer where > 1 

local zw, zs : array [l .A^] of boolean where zw — zs — 0 
’ loop forever do 




■ ^0 : NonCritical 

: await Vj : —<zs[j] 

I2 '■ (zw[i], zs[i\) (1, 1) 

^3 : If 3j : ot_fi, 2 [i] 

then zs[i\ 0; go-to .^4 
else := 0; go-to ^5 

^4 : await ; zs[j\ A —^zw[j\ then {zw[i], zs[i]) ( 0 , 1 ) 
^5 ; await Vj : —<zw[j] 

^6 : await Wj : j <. i : —izs[j] A —izw[j] 

£7 : Critical 
. : zs[i] 0 



Fig. 2. Parameterized Mutual Exclusion Algorithm szymanski. 

to be verified is mutual exclusion, which can be specified by p : Vi j : 

^{at-£r[i] A at-irlJ]), 1 = 2 , which led to a cutoff value of Nq = 4 . 

Transition Relations with “+ 1 ” or“0l” Constrains: Some of the param- 
eterized systems which we wish to verify have atomic sub-formulae of the forms 
/i2 = + 1 or /i2 = 6-1 0 1 (which stands for 62 = (6i mod N) + 1 ) within their 

transition relations. We resolve this difficulty by observing that 

3hi,h2 : 62 = 61 0 1 A Vt : R{hi,h2,i) ^ 

3/11, /i 2 : 61 < /i2 A (Vf : f < 61 V 62 < t) A Vf: /12, t) 

36 - 1 , 6-2 : 62 = 61 0 1 A Vt : i?(6i, 62, t) ^ 36 i, 62 : ^(61 <62 A \/t \ t < h\ V 
62 < f) V (62 <61 A Vt : 62 < t < 61)^ A Vr : R{hi,h2,i) 

In the first translation, we ensure that 62 = 61 0 1 by requiring that hi be 
smaller than 62 and that, for every other index t, either t is smaller or equal to 
61 or it is greater or equal to 62. In the second translation, expected to capture 
the constraint 62 = 61 0 1, we repeat the characterization of 62 = 61 0 1 but also 
allow the option that hi = N and 62 = 1 . This is ensured by 62 < 61 A Vt : 62 < 
t < hi. Since (Vt : P{t) V Vt : Q{t)) Vti,t2 : V <3(^2)), the formulae 

above can be easily expressed in the form required for transition relation. Thus, 
the cutoff value established in Corollary Q is still valid for both these cases. 

In Fig. El we present a program which coordinates mutual exclusion by passing 
a token around a ring. The signature of the system is (typei 1 -^ bool). The 
transition relation for this program includes the atomic formula 62 = 61 0 1 . 
The program consists of N client processes C[l], . . . , C[N] which can enter their 
critical section only when they have the token. Process C\i] has the token when 
the token variable token equals i. In addition, there are N transmission processes 
such that process T[i] is responsible for moving the token from client C[i] to client 
C[i 0 1 ] whenever process C[i\ is in its non-critical section. For this program we 
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in N : integer where > 1 

local token : [l..A^] 



N 


’ loop forever do 




N 


’ loop forever do 




II C[i]:: 




£q : NonCritical 




II II m-.-. 

i = l 




mo : when at—£o[i] A 






£■]_ : await i — token 






token — i 








£2 '■ Critical 








token i 0 1 





Fig. 3. Parameterized Mutual Exclusion Algorithm token-ring. 



have the parameters 6 = 1 (a single [l..A^]-variable: token), and H = 2. According 
to Corollary Q to establish an inductive assertion of the form Vzi ,*2 : 
for program token-ring, it suffices to take a cutoff value of Nq = 5. 

The Class (type^^ bool, type^^ i-^ type 2 ). In Fig. 0] we present a program 
which implements a restricted version of the Bakery Algorithm by Lamport. 




in Ni,N2 : integer where > 1,A^2 > 1 
local w : array [l..A^i] of [l..A^2] where w — Ni 

z : array [l-.A^i] of boolean where z — 0 

loop forever do 

■ £q : NonCritical 
^2 

£i : \J when Vj : {-'z[j] V u > iu[i]) do (z[i],iu[2]) 

U=1 

£2 ■ await Vj : {-<z[j] V < lufi]) 

£3 : Critical 
. £4 : z[i] 0 




loop forever do 




when z[i] A Vj : {~'z[j\ V w[j\ < u V w[j\ > 'iiifz]) do 
w[i\ u 



Fig. 4 . Parameterized Mutual Exclusion Algorithm bakery. 

In the standard Bakery algorithm the variables w[i\ are unbounded natural 
numbers. Here we bound them by A^ 2 - To make sure that they do not get stuck at 
N 2 and prevent any new values to be drawn at statement we have the reducing 
process i?[i] which attempts to identify a gap just below the current value of w[i\. 
Such a gap is a positive natural number u smaller than and which is not 
currently occupied by any w[j] for an active C[j], and such that all active w[j] 
are either below u or above Client C[j] is considered active if z[j] = 1. On 
identifying such a gap u, process R[i] reduces w[i] to u. The disjunction V^i 
occurring at statements £i and mo denotes a non-deterministic choice over all 
possible values of u in the range [ 1 ..A’ 2 ], provided the chosen value of u satisfies 
the condition appearing in the enclosed when statement. 

The property of mutual exclusion, it can be written as p : Vf 7 ^ j : [f] A 

Since both i and j are of type type^, this leads to a choice of A = 2 
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and I 2 = 0. From the program we can conclude that 6 = 0, c? = 0, and e = 1 
(corresponding to the single [I..-/V 1 ] 1 — > [I..-/V 2 ] array w). The transition relation 
can be written in the form m : Vt : i?, leading to H\ = 1 (the auxiliary variable 
i) and H 2 = 1 (the auxiliary variable u : type 2 ). Using these numbers, we obtain 
a cutoff value pair (A^°, = (3>4). 

Arbitrary Stratified Systems. Lemmadcan be generalized to systems with 
arbitrary array types, as long as the type structure is stratified, i.e., i < j for 
each type type^ 1 — > type^ . Consider a stratified BDS with k parameterized types 
typc]^, . . . , type^. Let bi be the number of type, vairables in the system, and 
let Cij be the number of type^ i— > type^- arrays in the system. 

Corollary 2. Let S be a k-parameter BDS with k > 1 stratified types to whieh 
we wish to apply proof rule INV with the assertions ip and p having each the form 
Vi }, . . . ,i^^, . . . ,if, . . . ,i^ : ifipj. Then, the premises of rule iNV are valid over 
S{Ni , . . . , Nk) for all N\, . . . , Nk > I iff they are valid over S{Ni , . . . , N^) where 
= bi+Hi+Ii, and for every i = 2, . . . ,k, N° = (bi+Hi+Ifi + Y^''~}^{eji- N°) . 

5 Systems with Unstratified Array Structure 

There are many interesting systems for which the restriction of stratification 
does not apply. For example, consider program Peterson presented in Fig. El 
which implements a mutual exclusion algorithm due to Peterson. Obviously, this 
system has an unstratified array structure. 



in N : integer where > 1 

type PrSd : [l..A^] 

Level : [O..A^] 

local y : array Pr^id of Level where y — 0 
s : array Level of Pr^id 



■ loop forever do 


- 




■ 


NonCritical 






h 


(y[i],s[l]) := (l,i) 






i2 


while y[i] <. N do 








r i 3 : await s[y[i]] pi V M j p i ■. y[j] < j/[i]l 
■■ (v[i],s[y[i] + 1]) := (y[i] + l,i) \ 






^5 


Critical 




. 


. ^6 


y[i\ ■.= 0 


. 




Fig. 5. Parameterized Mutual Exclusion Algorithm Peterson. 



When the system has an unstratified array structure, we lose the capability of 
reducing any counter-model which violates (Vj : A {3hWt : R{h,fi)) Vi : 

PiL to a model of size not exceeding Nq. But this does not imply that we cannot 
resolve this verification condition algorithmically. The first step in any deductive 
proof of a formula such as the above formula is that of skolemization which 
removes all existential quantifications on the left-hand side and all universal 
quantifications on the right-hand side of the implication, leading to 
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(Vj:V'(j)) A {yt:R{h,i)) (3) 

In subsequent steps, the deductive proof instantiates the remaining universal 
quantifications for j and t by concrete terms. Most often these concrete terms 
are taken from the (now) free variables of 0, namely, h and i. Inspired by 
this standard process pursued in deductive verification, we suggest to replace 
Formula © by 

( A ^(j)) A ( A ^ (4) 

which is obtained by replacing the universal quantification over j and t by a 
conjunction in which each conjunct is obtained by instantiating the relevant 
variables (j or by a subset (allowing replication) of the free variables h and 
i. The conjunction should be taken over all such possible instantiations. The 
resulting quantifier-free formula is not equivalent to the original formula © 
but the validity of © implies the validity of ©. For a quantifier-free formula 
such as ©, we have again the property of model reduction, which we utilize for 
formulating the appropriate decision procedure for unstratified systems. 

Consider an unstratified system S{N) with b variables of type [1..IV] and e 
arrays of type [1..IV] [l..A^]. As before, let H denote the number of existen- 

tially quantified variables in the definition of p and let I denote the number of 
universally quantified variables in the definition of cp. Furthermore, assume that 
the transition relation or candidate assertion do not contain nested arrays refer- 
ences. For example, we will replace the formula s[2/[f]] yf z by y[i] = h A s[/i] i, 
where h is a fresh auxiliary variable. Let INV'^ denote a version of rule INV in 
which all premises have been skolemized first and then, the remaining univer- 
sal quantifications replaced by conjunctions over all instantiations by the free 
variables in each formula. 

Claim. Let S{N) be a parameterized system as described above. Then if S{Nq) 
satisfies the premises of rule iNV applied to property^ for Nq = (e-|-l)(6-|-/-l-iL), 
we can conclude that p is an invariant of S{N) for every IV > 1. 

For strongly typed systems, such as Peterson, where comparisons and assign- 
ments are only allowed between elements of the same type, we can provide more 
precise bounds. Assume that the system has two types and that each of the 
bounds can be split into two components. Then the bound on Nq can be refined 
into Nq = ma.x{bi+Ii + Hi + e2i{b2+l2 + H2),b2 + l2 + H2 + ei2{bi+Ii + Hi)), 
where 621 and ei2 denote the number of type^^ 1-^ type2 and type2 1-^ type^^- 
arrays. For the case of Peterson, we have 61 = 62 = 0, h = I2 = 2, Hi — 1, 
H2 = 2, and ei2 = 621 = 1, which leads to Nq = 7 . 

6 The Proof of the Pudding 

According to a common saying “the proof of the pudding is in the eating” . In this 
section, we present the experimental results obtained by applying the method of 
invisible invariants to various systems. Table [D summarizes these results. 

The second column of the table specifies the number of processes used in 
the verification process. In some cases, we took a value higher than the required 
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minimum. The ri column specifies the time (in seconds) it took to compute the 
reachable states. Column T2 specifies the time it took to compute the candidate 
inductive assertion. Finally, column specifies the time it took to check the 
premises of rule INV. 

The systems on the left are each a single-type system which only employs 
equality comparison in their transition relations and candidate assertions. SZY- 
MANSKI employs inequalities, and token-ring needs the relation /12 = © 1 in 

its transition relation, bakery is a stratified two-type system employing inequal- 
ity comparisons, and Peterson is an unstratified two-type system. To obtain 
inductiveness in the Illinois’ cache protocol we had to add an auxiliary variable 
called last-dirty which records the index of the last process which made its cache 
entry dirty. 



Table 1. Summary of Experimental Results. 



System 


No 


Tl 


T2 


T3 




System 


No 


Tl 


T2 


T3 


MUX-SEM 


5 


.01 


.01 


.01 




SZYMANSKI 


4 


< .01 


.06 


.06 


S. German’s Cache 


4 


10.21 


10.72 


133.04 




TOKEN-RING 


5 


< .01 


< .01 


< .01 


Illinois’s Cache 


4 


1.47 


.04 


.58 




BAKERY 


5 


.41 


.16 


.25 


3-stages Pipeline 


6 


20.66 


.27 


29.59 




Peterson 


(6,7) 


79 


1211 


240 



7 Conclusion and Future Work 

The paper studies the problem of uniform verification of parameterized systems. 
We have introduced the method of verification by invisible invariants-a heuristic 
that has proven successful for fully automatic verification of safety properties for 
many parameterized systems. 

We are currently working on extending the method so that it also encom- 
passes liveness properties. To prove liveness properties, one has to come up with 
a well-founded domain and a ranking function from states into the well-founded 
domain. The ranking function has to be such that no state leads into a higher 
ranked state, and, because of fairness, every state eventually must lead into a 
lower ranked state. Thus, we need to extend the method of invisible invariants to 
generate well founded domains and ranking, as well as to have the counter-part 
of Lemma n to produce cutoff values for the case of liveness properties. 
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Abstract. The property of Positive Equality [2] dramatically speeds up validity 
checking of formulas in the logic of Equality with Uninterpreted Functions and 
Memories (EUFM) [4], The logic expresses correctness of high-level micropro- 
cessors. We present EVC (Equality Validity Checker) — a tool that exploits Positive 
Equality and other optimizations when translating a formula in EUFM to a propo- 
sitional formula, which can then be evaluated by any Boolean satisfiability (SAT) 
procedure. EVC has been used for the automatic formal verification of pipelined, 
superscalar, and VLIW microprocessors. 



1 Introduction 

Formal verification of microprocessors has historically required extensive manual 
intervention. Burch and Dill [4] raised the degree of automation by using flushing — 
feeding the implementation processor with bubbles in order to complete partially 
executed instructions — ^to compute a mapping from implementation to specification 
states. The correctness criterion is that one step of the implementation should be 
equivalent to 0, or 1, or up to k (for an implementation that can fetch up to k instruc- 
tions per cycle) steps of a specification single-cycle processor when starting from 
equivalent states, where equivalency is determined via flushing. However, the veri- 
fication efficiency has still depended on manually provided case-splitting expres- 
sions [4] [5] when using the specialized decision procedure SVC [16]. In order to 
apply the method to complex superscalar processors, Hosabettu [9] and Sawada 
[15] required months of manual work, using the theorem provers PVS [13] and 
ACL2 [10], respectively. We present EVC, a validity checker for the logic of EUFM, 
as an alternative highly efficient tool. 

2 Hardware Description Language 

In order to be verified with EVC, a high-level implementation processor and its 
specification must be defined in our Hardware Description Language (HDL). That 
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HDL is similar to a subset of Verilog [17], except that word-level values do not have 
dimensions but are represented with a single term-level expression, according to the 
syntax of EUFM [4]. Hence, nets are required to be declared of type term or type 
bit. Additionally, a net can be declared as input, e.g., the phase clocks that deter- 
mine the updating of state or the signals that control the flushing. The HDL has con- 
structs for the definition of memories and latches (see Fig. 2 for the description of 
two stages of the processor in Fig. 1). Memories and latches can have multiple input 
and/or output ports — of type inport and outport, respectively. Latch ports have 
an enable signal and a list of data signals. Memory ports additionally have an address 
signal after the enable. Logic gates — and, or, not, = (term-level equality compara- 
tor), and mux (multiplexor, i.e., ITE operator) — are used for the description of the 
control path of a processor. Uninterpreted functions and uninterpreted predicates — 
such as ALU in Fig. 2 — are used to abstract blocks of combinational logic — ^the 
ALU in Fig. 1 — as black boxes. Uninterpreted functions and uninterpreted predicates 
with no arguments are considered as term variables and Boolean variables, respec- 
tively, and can be used to abstract constant values that have special semantic mean- 
ing, e.g., the data value 0. 



WB Valid 




Fig. 1. Block Diagram of a 3-Stage Pipelined Processor. 



In order to fully exploit the efficiency of Positive Equality, the designer of high-level 
microprocessors must follow some simple restrictions. Data operands must not be 
compared by equality comparators, e.g., in order to determine a branch-on-equal 
condition. Instead, the equality comparison must be abstracted with the same uninter- 
preted predicate in both the implementation and the specification processor. Also, a 
flush signal must be included in the implementation processor, as shown in Fig. 1, in 
order to turn newly fetched instructions into bubbles during flushing. That extra input 
will be optimized away by setting it to 0 (the value during normal operation) when 
translating the high-level processor description to a gate-level synthesizable HDL. 
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Flush_bar = (not Flush) 

IF_Valid = (and Valid Flush_bar) 

(latch IF_EX 

(inport phi2 (SrcReg Data Op DestReg IF_Valid) ) 

(outport phil (EX_SrcReg EX_Data EX_Op EX_DestReg EX_Valid) ) ) 
RegsEqual = (= EX_SrcReg WB_DestReg) 
forward = (and RegsEqual WB_Valid) 

ALU_Data = (mux forward WB_Result EX_Data) 

Result = (ALU EX_Op ALU_Data) 

(latch EX_WB 

(inport phi2 (Result EX_DestReg EX_Valid) ) 

(outport phil (WB_Result WB_DestReg WB_Valid) ) ) 
write_RegFile = (and phil WB_Valid) 

(memory RegFile 

(inport write_RegFile WB_DestReg (WB_Result) ) 

(outport phi2 SrcReg (Data) ) ) 

Fig. 2. Using our HDL to Describe the Execution and Write-Back Stages. 



3 Tool Flow 

Our term-level symbolic simulator, TLSim, takes as input an implementation and a 
specification processor described in our HDL, as well as a command file that 
defines simulation sequences by asserting the input signals — phase clocks and flush 
controls — to binary values. Symbolic initial state for latches and memories is intro- 
duced automatically and event-driven symbolic simulation is performed according 
to the command file. TLSim allows for multiple simulation sequences to start from 
the same initial state, as well as to use the final state reached after symbolically sim- 
ulating one processor as the initial state for another. States of the same memory or 
latch, reached after different simulation sequences, can be compared for equality. 
The resulting formulas can be connected with similar formulas for other memories 
and latches via Boolean connectives in order to form the EUFM correctness for- 
mula. The symbolic simulation and generation of the correctness formula take less 
than a second even for complex designs. The formula is output in the SVC command 
language [16]. 

Our second tool, EVC (Equality Validity Checker), automatically translates the 
EUFM correctness formula to an equivalent propositional formula by exploiting 
Positive Equality [2] and a number of other optimizations [3][18][20][21]. The 
implementation processor is correct if the propositional formula is a tautology. Oth- 
erwise, a falsifying assignment is a counterexample. The propositional formula can 
be output in a variety of formats, including CNF and ISCAS, allowing the use of 
many SAT procedures for evaluating it. BDD [6] and BED [23] packages are inte- 
grated in EVC. 
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4 Summary of Results 

A single-issue 5-stage pipelined DLX processor [8] can be formally verified with 
EVC in 0.2 seconds on a 336 MHz Sun4. In contrast, SVC [16] — a tool that does not 
exploit Positive Equality — does not complete the evaluation of the same formula in 
24 hours. Furthermore, the theorem proving approach of completion functions [9] 
could be applied to a similar design after 1 month of manual work by an expert user. 
Finally, the symbolic simulation tool of Ritter, et al. [14] required over 1 hour of 
CPU time for verification of that processor. A dual-issue superscalar DLX with one 
complete and one arithmetic pipeline can be formally verified with EVC in 0.8 sec- 
onds [21]. A comparable design was verified by Burch [5], who needed 30 minutes 
of CPU time only after manually identifying 28 case-splitting expressions, and man- 
ually decomposing the commutative diagram for the correctness criterion into three 
diagrams. Moreover, that decomposition was sufficiently subtle to warrant publica- 
tion of its correctness proof as a separate paper [24]. The theorem proving approach 
of completion functions [9] required again 1 month of manual work for a comparable 
dual-issue DLX. 

EVC has been used to formally verify processors with exceptions, multicycle func- 
tional units, and branch prediction [19]. It can automatically abstract the forwarding 
logic of memories that interact with stalling logic in a conservative way that results 
in an order of magnitude speedup with BDDs [21]. A comparative study [22] of 28 
SAT-checkers, 2 decision diagrams — BDDs [1][6] and BEDs [23] — and 2 ATPG 
tools identified the SAT-checker Chaff [11] as the most efficient means for evalu- 
ating the Boolean formulas generated by EVC, outperforming the other SAT proce- 
dures by orders of magnitude. We also compared the [7] and the small domains 
[12] encodings for replacing equality comparisons that are both negated and not 
negated in the correctness EUFM formula. We found the encoding to result in 4 
times faster SAT checking when verifying complex correct designs and to consis- 
tently perform better for buggy versions. Now a 9-wide VLIW processor that imi- 
tates the Intel Itanium in many speculative features such as predicated execution, 
register remapping, branch prediction, and advanced loads can be formally verified 
in 12 minutes of CPU time by using Chaff. That design was previously verified in 
31.5 hours with BDDs [20]. It can have up to 42 instructions in flight and is far more 
complex than any other processor formally verified in an automatic way previously. 
We also found Positive Equality to be the most important factor for our success — 
without this property the verification times increase exponentially for very simple 
processors [22], even when using Chaff. 

A preliminary version of the tools has been released to the Motorola M»Core 
Microprocessor Design Center for evaluation. 



5 Conclusions and Future Work 

EVC is an extremely powerful validity checker for the logic of Equality with Uninter- 
preted Functions and Memories (EUFM) [4]. Its efficiency is due to exploiting the 
property of Positive Equality [2] in order to translate a formula in EUFM to a propo- 
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sitional formula that can be evaluated with SAT procedures, allowing for gains from 
their improvements. In the future, we will automate the translation of formally veri- 
fied high-level microprocessors, defined in our HDL and verified with EVC, to syn- 
thesizable gate-level Verilog [17]. TLSim and EVC, as well as the benchmarks used 
for experiments, are available by ftp (http://www.ece.cmu.edu/ 
~mvelev) . 
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Abstract. As new Internet applications emerge, new security protocols 
and systems need to be designed and implemented. Unfortunately the 
current protocol design and implementation process is often ad-hoc and 
error prone. To solve this problem, we have designed and implemented 
a toolkit AGVI, Automatic Generation, Verification, and Implementa- 
tion of Security Protocols. With AGVI, the protocol designer inputs the 
system specification (such as cryptographic key setup) and security re- 
quirements. AGVI will then automatically find the near-optimal proto- 
cols for the specific application, proves the correctness of the protocols 
and implement the protocols in Java. Our experiments have successfully 
generated new and even simpler protocols than the ones documented in 
the literature. 



1 Introduction 

As the Internet and electronic commerce prospers, new applications emerge 
rapidly and require that new security protocols and systems are designed and 
deployed quickly. Unfortunately, numerous examples show that security proto- 
cols are difficult to design, to verify the correctness, and particularly hard to 
implement correctly: 

— Different security protocols even with the same security properties vary in 
many system aspects such as computation overhead, communication over- 
head and battery power consumption. Therefore it is important to design 
optimal security protocols that suit specific applications. Unfortunately the 
current process of designing a security protocol is usually ad-hoc and in- 
volves little formalism and mechanical assistance. Such a design process is 
not only slow and error prone but also often miss the optimal protocols for 
specific applications. 

— Experience shows that security protocols are often flawed even when they 
are designed with care. To guarantee the correctness of security protocols, 
we need formal and rigorous analysis of the protocols, especially automatic 
protocol verifiers. 

— Software is notoriously flawed. Even if the design of the security protocol is 
correct, various implementation bugs introduced by programmers could still 
easily break the security of the system. 
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To solve these problems, we designed and implemented the AG VI toolkit 
which stands for Automatic Generation, Verification, and Implementation of 
Security Protocols. With AGVI, the protocol designer specifies the desired secu- 
rity requirements, such as authentication and secrecy, and system specification, 
e.g., symmetric or asymmetric encryption/decryption, low bandwidth. A protocol 
generator then generates candidate security protocols which satisfy the system 
requirements using an intelligent exhaustive search in a combinatorial protocol 
space. Then a protocol screener analyzes the candidate protocols, discards the 
flawed protocols, and outputs the correct protocols that satisfy the desired secu- 
rity properties. In the final step, a code generator automatically outputs a Java 
implementation from the formal specification of the generated security protocols. 

Even a simple security protocol can have an enormous protocol space (for 
example, for a four-round authentication protocol, even after constraining mes- 
sage format and sending order, we estimate that there are at least 10^^ possible 
variation protocols that one would need to consider to find an optimal one for the 
specific application!). Facing this challenge, we have developed powerful reduc- 
tion techniques for the protocol generator to weed out obviously flawed protocols. 
Because the protocol generator uses simple criteria to rule out obviously flawed 
protocols, it is fast and can analyze 10,000 protocols per second. Protocols that 
were not found flawed by the protocol generator are then send to the protocol 
screener which can prove whether the protocol is correct or not. Our protocol 
screener has the ability to analyze protocol executions with any arbitrary proto- 
col configuration. When it terminates, it either provides a proof that a protocol 
satisfies its specified property under any arbitrary protocol configuration if it 
is the case, or it generates a counterexample if the property does not hold. It 
also exploits many state space reduction techniques to achieve high efficiency. 
On average, our protocol screener can check 5 to 10 synthesized protocols per 
second (measured on a 500 MHz Pentium III workstation running Linux). 

We have successfully experimented with AGVI in several applications. We 
have found new protocols for authentication and key distribution protocols using 
AGVI and some of them are even simpler than the standard protocols docu- 
mented in the literature such as ISO standards Details about the exper- 

iments and techniques in the tool can be found in |PS00alPS00b| . 

2 Components in AGVI 

2.1 The Protocol Generator 

Our protocol generator generates candidate protocols that satisfy the specified 
system specification and discards obviously flawed protocols at an early stage. 
Intuitively, the protocol space is infinite. To solve this problem is to use iterative 
deepening, a standard search technique. In each iteration, we set a cost threshold 
of protocols. We then search through the protocol space to generate all the 
protocols below the given cost threshold. After sorting the protocols, the protocol 
screener tests them in the order of increasing cost. If one protocol satisfies the 
desired properties, it is minimal with respect to the cost metric function given 
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by the user and the generation process stops. Otherwise, we increase the cost 
threshold and generate more protocols. 

A simple three-party authentication and key distribution protocol has a pro- 
tocol space of order 10^^. Our protocol generator generates and analyzes 10000 
protocols per second, which would take over three years to explore the entire 
space. We have developed powerful protocol space reduction techniques to prune 
the search tree at an early stage. With these pruning techniques, it only takes 
the protocol generator a few hours to scan through the protocol space of order 
10^^. More details are included in IPSOQalPSOObj . 



2.2 The Protocol Screener 



We use Athena as the protocol screener |!Sont)t)ESHF()()j . Athena uses an extension 
of the recently proposed Strand Space Model |TH(19SIJ to represent protocol 
execution. Athena incorporates a new logic that can express security properties 
including authentication, secrecy and properties related to electronic commerce. 
An automatic procedure enables Athena to evaluate well-formed formulae in 
this logic. For a well-formed formula, if the evaluation procedure terminates, it 
will generate a counterexample if the formula is false, or provide a proof if the 
formula is true. Even when the procedure does not terminate when we allow any 
arbitrary configurations of the protocol execution, (for example, any number of 
initiators and responders), termination could be forced by bounding the number 
of concurrent protocol runs and the length of messages, as is done in most existing 
automatic tools. 

Athena also exploits several state space reduction techniques. Powered with 
techniques such as backward search and symbolic representation, Athena natu- 
rally avoids the state space explosion problem commonly caused by asynchronous 
composition and symmetry redundancy. Athena also has the advantage that it 
can easily incorporate results from theorem proving through unreachability the- 
orems. By using the unreachability theorems, it can prune the state space at 
an early stage, hence, further reduce the state space explored and increase the 
likely-hood of termination. These techniques dramatically reduce the state space 
that needs to be explored. 



2.3 The Code Generator 

Our goal for the automatic code generator is to prevent implementation weak- 
nesses, and obtain a secure implementation if the initial protocol is secure. The 
code generator is essentially a translator which translates the formal specification 
into Java code. Given that the translation rules are correct, the final impleme- 
nation can be shown to be correct using proof by construction. In particular, 
we show that our implementation is secure against some of the most common 
vulnerabilities: 

— Buffer overruns account for more than half of all recent security vulnera- 
bilities. Since we use Java as our implementation language, our automatically 
generated code is immune against this class of attack. 
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— False input attacks result from unchecked input parameters or unchecked 
conditions or errors. Our automatic implementation ensures that all input 
parameters are carefully checked to have the right format before used. 

— Type flaws occur when one message component can be interpreted as an- 
other message component of a different form. In the implementation, we use 
typed messages to prevent type flaws. 

— Replay attacks and freshness attacks are attacks where the attacker can 
reuse old message components in the attack. Athena already ensures that the 
protocols are secure against these attacks. To ensure that the implementation 
is secure, we use cryptographically secure pseudo-random number generators 
to create secure nonces. 

The code generator uses the same protocol description as Athena uses. The 
generated code provides a simple yet flexible API for the application programmer 
to interface with. More details about the code generator can be found in |PPS()0| . 




Fig. 1. AGVI GUI. 



3 Experiments 

We have used AGVI to automatically generate and implement authentication 
and key distribution protocols involving two parties with or without a trusted 
third party. In one experiment, we vary the system aspects: one system speci- 
cation has a low computation overhead but a high communication overhead 
and another system specication has a low communication overhead and a high 
computation overhead. The AGVI found different optimal protocols for metric 
functions used in the two different cases. In another experiment, we vary the 
security properties required by the system. Key distribution protocols normally 
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have a long list of possile security properties and an application might only re- 
quire a subset of the list. The AGVI also found different optimal protocols for 
different security requirements. In both experiments, AGVI found new proto- 
cols that are more efficient or as efficient as the protocols documented in the 
literature. More details can be found in IPM()()alPM()()hl . 
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Decision procedures are at the core of many industrial-strength verification 
systems such as ACL2 Ik M 9*71 . PVS IORS92I . or STeP |MtSg96| . Effective use 
of decision procedures in these verification systems require the management of 
large assertional contexts. Many existing decision procedures, however, lack an 
appropriate API for managing contexts and efficiently switching between con- 
texts, since they are typically used in a fire- and- forget environment. 

ICS (Integrated Canonizer and Solver) is a decision procedure developed 
at SRI International. It does not only efficiently decide formulas in a useful 
combination of theories but it also provides an API that makes it suitable for 
use in applications with highly dynamic environments such as proof search or 
symbolic simulation. 

The theory decided by ICS is a quantifier-free, first-order theory with unin- 
terpreted function symbols and a rich combination of datatype theories including 
arithmetic, tuples, arrays, sets, and bit-vectors. This theory is particularly inter- 
esting for many applications in the realm of software and hardware verification. 
Combinations of a multitude of datatypes occur naturally in system specifica- 
tions and the use of uninterpreted function symbols have proven to be essential 
for many real-world verifications. 

The core of ICS is a congruence closure procedure for the theory 

of equality and disequality with both uninterpreted and interpreted function 
symbols. This algorithm is based on the concepts of canonization and solving 
as introduced by Shostak EM- These basic notions have been extended to 
include inequalities over linear arithmetic terms and propositional logic. Alto- 
gether, the theory supported by ICS is similar to the ones underlying the PVS 
decision procedures and SVC IBDUbbl : it includes: 

— Function application f (ti t„) for uninterpreted function symbols f of 

arity n. 

— The usual propositional constants true, false and connectives not, &, |, 
=>, <=>. 

— Equality (=) and disequality (/=). 

— Rational constants and the arithmetic operators +, *, note that the deci- 
sion procedure is complete only for multiplication restricted to multiplication 

* This work was supported by SRI International, by NSF Grant CCR-0082560, 
DARPA/AFRL Contract F33615-00-C-3043, and by NASA Contract NASl-00079. 
ICS (TM) is a trademark of SRI International. 
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by constants. Arithmetic predicates include an integer test and the usual in- 
equalities <, <=, >, >=. 

— Tuples (ti, . . . ,tn) together with the proj [i,n] (t) operator for projecting 
the j-element in an n-tuple. 

— Lookup a [x] and update a [x : =t] operations for a functional array a. 

— The constant sets (empty, full), set membership (x in s), and set opera- 
tors, including complement (compl(s)), union (si union S2), and intersec- 
tion (si inter S2). 

— Fixed-sized bitvectors including constants, concatenation (61 ++ 62), ex- 
traction {bli'.jl), bit-wise operations like bit-wise conjunction, and built- 
in arithmetic relations such as add (61, 62, 6). This latter constraint encodes 
the fact that the sum of the unsigned interpretations of b\ and 62 equals the 
unsigned interpretation of b. Fixed-sized bitvectors are decided using the 
techniques described in iHEng. 

ICS is capable of deciding formulas such as 

— x +2 = y => f(a[x:= 3 ] [y- 2 ] ) = f(y-x+l) 

— f(y-l)-l = y+1 & f(x)+l = x-1 & x+1 = y => false 

— f(f(x)-f(y)) /= f(z) & y <= X & y >= x+z & z >= 0 => false 

These formulas contain uninterpreted function symbols such as f and interpreted 
symbols drawn from the theories of arithmetic and the functional arrays. 

Verification conditions are usually proved within the context of a large num- 
ber of assertions derived from the antecedents of implications, conditional tests, 
and predicate subtype constraints. These contexts must be changed in an incre- 
mental manner when assertions are either added or removed. Through the use 
of functional data structures, ICS allows contexts to be incrementally enriched 
in a side-effect-free manner. 

ICS is implemented in Ocaml, which offers satisfactory run-time performance, 
efficient garbage collection, and interfaces well with other languages like C. The 
implementation of ICS is based on optimization techniques such as hash-consing 
and efficient data structures like Patricia trees for representing sets and maps 
efficiently. ICS uses arbitrary precision rational numbers from the GNU multi- 
precision library (CMP). 

There is a well-defined API for manipulating ICS terms, asserting formulas to 
the current database, switching between databases, and functions for maintain- 
ing normal forms and for testing the validity of assertions by means of canoniza- 
tion. This API is packaged as a C library, an Ocaml module, and a Common Lisp 
interface. The C library API, for example, has been used to connect ICS with 
PVS |()PS 92 ] . and both an interaction and a batch processing capability have 
been built using this API. 

Consider, for example, processing f (y - 1) - 1 = y + 1, f(x) + 1 = x - 
1 , and X + 1 = y from left-to-right using the interactive mode of ICS . 
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$ ics 

ICS interpreter. Copyright (c) 2001 SRI International. 

Ctrl-d to exit . 

> assert f(y-l)-l=y+l. 

This equation is asserted in its solved form asy = -2 + f(-l + y). This equa- 
tion is indeed considered to be in solved form, since y on the right-hand side 
occurs only in the scope of the uninterpreted f . Terms in the database are par- 
titioned into equivalence classes, and the canonical representative of any term t 
with respect to this partition is represented by can t; for example: 

> can -1 + y. 

-3 + f(-l + y) 



It can be shown that can ti is identical to can t 2 iff the equality ti = t 2 is 
derivable in the current context. Now, the second equation is processed 

> assert f(x) + 1 = x - 1. 



by canonizing it to 1 + f(x) = -1 + x and solving this equation as x = 2 + 
f (x). Finally, can x + 1 yields 3 + f (x) and can y is -2 + f (-1 + y). Thus, 
the third equation is solved as f (x) = -5 + f(-l + y). Since f(x) = f(-l + 
y), using x = -1 + y and congruence, there is a contradiction -5 = 0. Indeed, 
ICS detects this inconsistency, when given the assertion below. 

> assert x + 1 = y. 

Inconsistent ! 



The efficiency and scalability of ICS in processing formulas, the richness of its 
API, and its ability for fast context-switching should make it possible to use 
it as a black box for discharging verification conditions not only in a theorem 
proving context but also in applications like static analysis, abstract interpreta- 
tion, extended type checking, symbolic simulation, model checking, or compiler 
optimization. 

ICS is available free of charge under the PVS license. It will also be included 
in the upcoming release of PVS 3.0. The complete sources and documentation 
of ICS are available at 



http : //www. iccinsolve . coe 
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1 Introduction 

/rCRL ^ is a language for specifying and verifying distributed systems in an 
algebraic fashion. It targets the specification of system behaviour in a process- 
algebraic style and of data elements in the form of abstract data types. The 
/rCRL toolset m (see http://www.cwi.nl/~mcrl) supports the analysis and 
manipulation of jrCRL specifications. A /iCRL specification can be automatically 
transformed into a linear process operator (LPO). All other tools in the /rCRL 
toolset use LPOs as their starting point. The simulator allows the interactive 
simulation of an LPO. There are a number of tools that allow optimisations on 
the level of LPOs. The instantiator generates a labelled transition system (LTS) 
from an LPO (under the condition that it is finite-state), and the resulting LTS 
can be visualised, analysed and minimised. 

An overview of the /rCRL toolset is presented in Figure E This picture is 
divided into three layers: /rCRL specifications, LPOs and LTSs. The rectangular 
boxes denote different ways to represent instances of the corresponding layer (for 
example, LPOs can be represented in a binary or a textual form). A solid arrow 
denotes a transformation from one instance to another that is supported by the 
/rCRL toolset; keywords are provided to these arrows to give some information 
on which kinds of transformations are involved. Finally, the oval boxes represent 
several ways to analyse systems, and dashed arrows show how the different rep- 
resentations of LPOs and LTSs can be analysed. The box named BCG and its 
three outgoing dashed arrows actually belong to the CADP toolset (see Section 
EJ. The next three sections are devoted to explaining Figure Q in more detail. 

The /rCRL toolset was successfully used to analyse a wide range of protocols 
and distributed algorithms. Recently it was used to support the optimised re- 
design of the Transactions Capabilities Procedures in the SS No. 7 protocol stack 
for telephone exchanges m, to detect a number of mistakes in a real-life proto- 
col over the CAN bus for lifting trucks cni, to analyse a leader election protocol 
from the Home Audio/Video interoperability (HAVi) architecture |2D|, and to 
perform scenario-based verifications of the coordination language SPLICE jS| . 
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Fig. 1. The Main Components of the /rCRL Toolset. 



2 /.iCRL Specifications 

The /rCRL language is based on the process algebra AGP. It allows one to specify 
system behaviour in an algebraic style using atomic actions, alternative and 
sequential composition, parallelism and communication, encapsulation, hiding, 
renaming and recursive declarations. Furthermore, /rCRL supports equationally 
specified abstract data types. In order to intertwine processes and data, atomic 
actions and recursion variables carry data parameters. Moreover, an if-then- 
else construct enables that data elements influence the course of a process, and 
alternative quantification chooses from a possibly infinite data domain. 



3 Linear Process Operators 

When investigating systems specified in /rCRL, our current standard approach is 
to transform the /iCRL specification under scrutiny to a relatively simple format 
without parallelism or communication, called an LPO. In essence this is a vector 
of data parameters together with a list of condition, action and effect triples, 
describing when an action may happen and what is its effect on the vector of 
data parameters. It is stored in a binary format or as a plain text file. From an 
LPO one can generate an LTS, in which the states are parameter vectors and 
the edges are labelled with parametrised actions. 

In it is described how a large class of /iCRL processes can be transformed 
automatically to a bisimilar LPO. The resulting LPO and its data structures are 
stored as ATerms. The ATerm library [Sj stores terms in a very compact way by 
minimal memory requirements, employing maximal sharing, and using a tailor- 
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made garbage collector. Moreover, the ATerm library uses a file format that is 
even more compact than the memory format. 

The /rCRL toolset comprises five tools (constelm, sumelm, parelm, structelm 
and rewr) that target the automated simplification of LPOs while preserving 
bisimilarity |H|. These tools do not require the generation of the LTS belong- 
ing to an LPO, thus circumventing the ominous state explosion problem. The 
simplification tools are remarkably successful at simplifying the LPOs belonging 
to a number of existing protocols. In some cases these simplifications lead to a 
substantial reduction of the size of the corresponding LTS. 

Elimination of constant parameters. A parameter of an LPO can be re- 
placed by a constant value, if it can be statically determined that this pa- 
rameter remains constant throughout any run of the process. 

Elimination of sum variables. The choice of a sum ranging over some data 
type may be restricted by a side condition to a single concrete value. In that 
case the sum variable can be replaced by this single value. 

Elimination of inert parameters. A parameter of an LPO that has no (di- 
rect or indirect) influence on the parameters of actions or on conditions does 
not influence the LPO’s behaviour and can be removed. Whereas the two re- 
duction techniques mentioned above only simplify the description of an LPO, 
elimination of inert parameters may lead to substantial reduction of the LTS 
underlying an LPO. If the inert parameter ranges over an infinite domain, 
the number of states can even reduce from infinite to finite by this operation. 
This typically happens after hiding part of the system’s behaviour. 
Elimination of data structures. Sometimes, the operations above cannot be 
applied to single parameters, but they can be applied to parts of the data 
structures that these variables range over. For this to take place, such data 
structures must be partitioned into their constituents. 

Rewriting data terms. The data terms occurring in an LPO can be rewritten 
using the equations of the data types. If a condition is rewritten to false, then 
the corresponding condition, action and effect triple in the LPO is removed. 

Confluence is widely recognised as an important feature of the behaviour 
of distributed communicating systems. Roughly, a r-transition from a state in 
an LTS, representing an internal computation that is externally invisible, is 
confluent if it commutes with any other transition starting in this same state. 
In [1 S] it was shown that confluence can be used in process verification. In H51 
several notions of confluence were studied on their practical applicability, and it 
was shown that on the level of LPOs confluence can be expressed by means of 
logical formulas. In ^ it is shown that the presence of confluence within an LPO 
can be exploited at a low cost at the level of the instantiator, i.e., during the 
generation of the associated LTS. A prototype of this generation algorithm was 
implemented, and experience learns that this exploitation of confluence within an 
LPO may lead to the generation of an LTS that is several orders of magnitudes 
smaller compared to the standard instantiator. The detection of confluence in an 
LPO is performed using the automated reasoning techniques that are surveyed 
in Section El 
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4 Labelled Transition Systems 

The SVC format offers an extremely compact file format for storing LTSs. 
This format is open in its specification and implementation, and allows states 
to be labelled by ATerms. A prototype visualisation tool has been developed 
for the SVC format, dubbed Drishti. A reduction algorithm based on conflu- 
ence and minimisation algorithms modulo equivalences such as bisimulation and 
branching bisimulation have been implemented, collapsing equivalent states. 

Alternatively, LTSs belonging to /rCRL specifications can be visualised and 
analysed using the Caesar/Aldebaran Development Package (CADP) [Zj. This 
toolset originally targets the analysis of LOTOS specifications. Caesar gener- 
ates the LTS belonging to a LOTOS specification, and supports simulation. 
Aldebaran performs equivalence checking and minimisation of LTSs modulo a 
range of process equivalences. XTL offers facilities for model checking formulas 
in temporal logics. The CADP toolset comprises the BCG format, which sup- 
ports compact storage of LTSs. SVC files can be translated to BCG format and 
vice versa, given a CADP license (as the BCG format is not open source). 

In m a reduction algorithm for LTSs is presented, based on priorisation 
of confluent r-transitions. First the maximal class of confluent r-transitions is 
determined, and next outgoing confluent r-transitions from a state are given 
priority over all other outgoing transitions from this same state. For LTSs that 
do not contain an infinite sequence of r-transitions, this reduction preserves 
branching bisimulation. An implementation of this algorithm is included in the 
/rCRL toolset. In some cases it reduces the size of an LTS by an exponential 
factor. Furthermore, the worst-case time complexity of the reduction algorithm 
from ira compares favourably with minimisation modulo branching or weak 
bisimulation equivalence. Hence, the algorithm from El can serve as a useful 
preprocessing step to these minimisation algorithms. 



5 Symbolic Reasoning abont Infinite-State Systems 

For very large finite-state systems, a symbolic analysis on the level of LPOs may 
result in the generation of much smaller LTSs. For systems with an inherently 
infinite number of states the use of theorem proving techniques is indispensable. 

The original motivation behind the LPO format was that several proper- 
ties of a system can be uniformly expressed by first-order formulae. Effective 
proof methods for LPOs have been developed, incorporating the use of invari- 
ants [3 and state mappings m- Also the confluence property of an LPO can 
be expressed as a large first-order formula HS|. Using these techniques, large 
distributed systems were verified in a precise and logical way, often with the 
help of interactive theorem provers. See 0 for an overview of such case studies. 

Since the confluence properties and the correctness criteria associated with 
state mappings for industrial-scale case studies tend to be rather fiat but very 
large, we are developing a specialised theorem prover based on an extension of 
BDDs with equality m- A prototype tool has been implemented PI. which was 
used to detect confluence in a leader election protocol and in a Splice specification 
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from (This information on confluence was exploited using the method of 0|; 

see Section El) This tool can also check invariants and the correctness criteria 

associated with a state mapping between a specification and its implementation. 
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1 Introduction 

Concurrent software and hardware systems play an increasing role in today’s 
applications. Due to the large number of states and to the high degree of non- 
determinism arising from the dynamic behavior of such systems, testing is gen- 
erally not sufhcient to ensure the correctness of their implementation. Formal 
specihcation and verihcation methods are therefore becoming more and more 
popular, aiming to give rigorous support for the system design and for establish- 
ing its correctness properties, respectively (cf. ^ for an overview). 

In view of the inherent complexity of formal methods it is desirable to pro- 
vide the user with tool support. It is even indispensable for the design of safety- 
critical concurrent systems where an ad hoc or conventional software engineering 
approach is not justihable. There is one particularly successful automated ap- 
proach to verihcation, called model checking, in which one tries to prove that (a 
model of) a system has certain properties specihed in a suitable logic. 

During the recent years several prototypes of model-checking tools have been 
developed, e.g., CWB NCSIJ-CWB Q, SPIN Q, and the symbolic model 
checker SMV Q. Most of these are tailored to a specihc setting, choosing, e.g., 
the CCS process algebra with transition-system semantics as the specihcation 
language and offering model checking for the modal //-calculus. 

However, in the theoretical modeling and in the implementation of concurrent 
systems there exists a wide range of specihcation formalisms, semantic domains, 
logics, and model-checking algorithms. Our aim is therefore to offer a modular 
verihcation system which can be easily adjusted to different settings. We started 
out in 1998 with the development of an initial version of our tool, called Truth, 
which is described in Section^ It was complemented later by rapid prototyping 
support for specihcation languages, provided by the SLC specihcation language 
compiler generator presented in Section J The most recent component of the 
Truth Verihcation Platform is a dedicated parallel version running on work- 
station clusters which is intended for high-end verihcation tasks, and which is 
briehy described in Section^ 

2 Truth: The Basic Tool 

Here we give a short account of the actual Tr uth tool. For a more thorough 
presentation, the reader is referred to Q and to where different releases can 
be downloaded. 
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In its basic version Truth supports the specification and verification of con- 
current systems described in CCS, a well-known process algebra To support 
the understanding of the system’s behaviour, the specification can be graphically 
simulated in an interactive and process-oriented way. Figure | shows a screen- 
shot for the simulation of a two-place buffer process B2, composed in parallel of 
two commicating instances of a unary buffer Bl. 



Truth - Simulation of CCS Processes 








Restart 



Help I 

fibout 

Quit 



Fig. 1. A Process-Orientted Simulation of a Two-Place Buffer. 



From the specification a labeled transition system is built. Its desired prop- 
erties can be expressed using the //-calculus, a powerful logic which allows to 
describe various safety, liveness, and fairness properties. It semantically sub- 
sumes the temporal logics CTL (whose operators are implemented as macros in 
Truth), CTL*, and LTL. 

Truth offers several model checking algorithms, such as the tableau-based 
model checker proposed in Q. It has fairly good runtime properties and supports 
the full //-calculus. Furthermore, it is a local model checking algorithm, i.e., it 
has the advantage that in many cases only a part of the transition system has 
to be built in order to verify or to falsify a formula. 

Additionally, a local game-based algorithm has been integrated which can 
be used to demonstrate the invalidity of a formula by means of an interactive 
construction of a counterexample QQ. Again, the process visualization com- 
ponent is used to play and visualize this game between the user and the Truth 
tool in order to support debugging of error-prone specifications. 
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As mentioned in the introduction, we have chosen a modular design that 
allows easy modihcations and extensions of the system. In particular, this feature 
is exploited by a compiler-generator extension which will be described in the 
following section. Figure^gives an overview of the software architecture. 




Fig. 2. Structure of Truth/SLC. 



Truth is implemented in Haskell, a general-purpose, fully functional pro- 
gramming language. The choice of a declarative language serves a number of 
purposes. Changes to the system become easier when using a language which 
lacks side effects. Moreover many algorithms which are employed in the context 
of model checking have a very concise functional notation. This makes the im- 
plementation easier to understand. Furthermore, in principle it allows to prove 
the correctness of the implementation which is crucial for a model-checking tool 
to be used in safety-critical applications. By employing optimization techniques 
such as state monads for destructive updates we achieve a runtime behaviour 
which is competitive with other model-checking tools supporting process speci- 
hcations in CCS. 

3 SLC: The Specification Language Compiler Generator 

A notable extension of Truth is the SLC Specihcation Language Compiler Gen- 
erator which provides generic support for different specihcation formalisms Q. 
Given a formal description of a specihcation language, it automatically generates 
a corresponding Truth frontend (cf. Figure^. 

More specihcally, the syntax and semantics of the specihcation language has 
to be describ ed i n terms of Rewriting Logic, a unihed semantic framework for 
concurrency From this dehnition a compiler is derived which is capable 

of parsing a concrete system specihcation and of computing the corresponding 
semantic object, such as a labeled transition system. This compiler is linked 
together with the Truth platform to obtain a model-checking tool which is 
tailored for the specihcation language in question. 

The description of the specihcation language formalism consists of three 
parts. First, the syntax of the language has to be given in terms of a context 
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free grammar (with typing information) . The second part is a set of conditional 
rewrite rules defining the operational semantics. 

Finally, the description contains a set of equations between process terms 
which identify certain states of the respective system, thus reducing the state 
space. Considering CCS for example, we can define equations like x \\ y = y \\ x 
and X II nil = x. Then the resulting transition system is minimized with respect 
to symmetry, and, since “dead” nil processes are removed, it is often finite-state 
although the original semantics would yield an infinite system. 

We have successfully developed an instance of Truth for a version of CCS re- 
specting the previous equations. To verify that our approach is also applicable in 
connection with other models of concurrency than labeled transition systems, we 
constructed an implementation for Petri nets. Currently we employ our compiler 
generator to support the distributed functional programming language Erlang. 

4 Truth: The Parallel Version 

Despite the improvements of model checking techniques during the last years, 
the so-called state space explosion still limits its application. While partial order 
reduction or symbolic model checking reduce the state space by orders 
of magnitude, typical verification tasks still last days on a single workstation or 
are even (practically) undecidable due to memory restrictions. 

On the other hand, cheap yet powerful parallel computers can be constructed 
by building Networks Of Workstations [NOWs). From the outside, a NOW ap- 
pears as one single parallel computer with high computing power and, even more 
important, huge amount of memory. This enables parallel programs to utilize the 
accumulated resources of a NOW to solve large problems. 

Flence, it is a fundamental goal to find parallel model checking algorithms 
which then may be combined with well-known techniques to avoid the state space 
explosion to gain even more speedup and further reduce memory requirements. 

We developed a parallel model checking algorithm for the alternation-free 
fragment of the //-calculus. It distributes the underlying transition system and 
the formula to check over a NOW in parallel and determines, again in parallel, 
whether the initial state of the transition system satisfies the formula. 

Systems with several millions of states could be constructed within half an 
hour on a NOW consisting of up to 52 processors. We found out that the algo- 
rithm scales very well wrt. run-time and memory consumption when enlarging 
the NOW. Furthermore, the distribution of states on the processors is homoge- 
neous. 

While the demand for parallel verification procedures also attracted several 
other researchers (on overview can be found in ^), Parallel Truth is — to our 
knowledge — the first parallel model checking tool that allows the validation of 
safety and liveness properties. 

A thorough presentation of this algorithm and its runtime properties can be 
found in Q. 
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1 Introduction 

The SLAM toolkit checks safety properties of software without the need for 
user-supplied annotations or abstractions. Given a safety property to check on 
a C program P, the SLAM process ^ iteratively refines a boolean program 
abstraction of P using three tools: 

— C2bp, a predicate abstraction tool that abstracts P into a boolean program 
BP{P,E) with respect to a set of predicates E over P PEI; 

— Bebop, a tool for model checking boolean programs |2|, and 

— Newton, a tool that discovers additional predicates to refine the boolean 
program, by analyzing the feasibility of paths in the C program. 

Property violations are reported by the SLAM toolkit as paths over the pro- 
gram P. Since property checking is undecidable, the SLAM refinement algorithm 
may not converge. We have applied the SLAM toolkit to automatically check 
properties of device drivers taken from the Microsoft Driver Development Kit. 
While checking for various properties, we found that the SLAM process con- 
verges to a boolean program that is sufficiently precise to validate/invalidate 
the property p. 

Several ideas behind the SLAM tools are novel. C2bp is the first automatic 
predicate abstraction tool to handle a full-scale programming language with 
procedure calls and pointers, and perform a sound and precise abstraction. Be- 
bop is the first model checker to handle procedure calls using an interprocedural 
dataflow analysis algorithm, augmented with representation tricks from the sym- 
bolic model checking community. Newton uses a path simulation algorithm in 
a novel way, to generate predicates for refinement. 

2 Overview and Example 

We introduce the SLAM refinement algorithm and apply it to a small code exam- 
ple. We have created a low-level specification language called Stic (Specification 
Language for Interface Checking) for stating safety properties. Figure ^a) shows 
a Slic specification that states that it is an error to acquire (or release) a spin 
lock twice in a row. There are two events on which state transitions happen — 
returns of calls to the functions KeAcquireSpinLock and KeReleaseSpinLock. 

We wish to check if a temporal safety property ip specified using Stic is 
satisfied by a program P. We have built a tool that automatically instruments 
the program P with property p to result in a program P' such that P satisfies 
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state { 


enum { Unlocked=0, Locked=l } 


enum { Unlocked=0, Locked=l } 


state = Unlocked; 


state = Unlocked; 




} 


void slic_abort() { 




SLIC_ERROR: ; 


KeAcquireSpinLock. return { 


} 


if (state == Locked) 




abort ; 


void KeAcquireSpinLock_return() { 


else 


if (state == Locked) 


state = Locked; 


slic_abort 0 ; 


} 


else 




state = Locked; 


KeReleaseSpinLock. return { 


} 


if (state == Unlocked) 




abort ; 


void KeReleaseSpinLock_return { 


else 


if (state == Unlocked) 


state = Unlocked; 


slic_abort 0 ; 


} 


else 




state = Unlocked; 

} 


(a) 


(b) 



Fig. 1. (a) A Slic Specification for Proper Usage of Spin Locks, and (b) Its 
Compilation into C Code. 



If iff the label SLIC_ERROR is not reachable in P' . In particular, the tool first 
creates C code from the Slic specification, as shown in Figure mb). The tool 
then inserts calls to the appropriate Slic C functions in the program P to result 
in the instrumented program P' . 

Now, we wish to check if the label SLIC_ERRDR is reachable in the instru- 
mented program P' . Let i be a metavariable that records the SLAM iteration 
count. The first iteration {i = 0) starts with the set of predicates Eq that are 
present in the conditionals of the Slic specification. Let Ei be some set of pred- 
icates over the state of P'. Then iteration i of SLAM is carried out using the 
following steps: 

1. Apply C2bp to construct the boolean program BV{P' , Ei). Program 
BV{P', Ei) is guaranteed to abstract the program P' , as every feasible execu- 
tion path p of the program P' also is a feasible execution path of BV{P' , Ei). 

2. Apply Bebop to check if there is a path pi in BV{P' ,Ei) that reaches the 
SLIC_ERRDR label. If Bebop determines that SLIC_ERROR is not reachable, 
then the property f is valid in P, and the algorithm terminates. 

3. If there is such a path pi, then we use Newton to check if pi is feasible in 
P'. There are two outcomes: “yes”, the property f is violated by P and the 
algorithm terminates with an error path pp, “no”, Newton finds a set of 
predicates Ei that explain the infeasibility of path pi in P' . 

4. Let Ai+i := Ei U Ei, and i \= i + 1, and proceed to the next iteration. 

Figure Sa) presents a snippet of (simplified) C code from a PCI device 
driver. Figure Elb) shows the instrumented program (with respect to the Slic 



262 Thomas Ball and Sriram K. Rajamani 



void example 0 { 
do { 

KeAcquireSpinLockO ; 

nPacketsOld = nPackets; 
req = devExt->WLHV; 
if(req && req->status)-[ 
devExt->WLHV = req->Next ; 
KeReleaseSpinLockO ; 

irp = req->irp; 
if (req->status > 0){ 

irp->IoS . Status = SUCCESS; 
irp->IoS . Inf o = req->Status; 
} else { 

irp->IoS . Status = FAIL; 
irp->IoS . Inf o = req->Status; 

} 

SmartDevFreeBlock(req) ; 
loCompleteRequest (irp) ; 
nPackets++; 

} 

} while (nPackets ! =nPacketsDld) ; 
KeReleaseSpinLockO ; 

1 

(a) Program P 



void example 0 { 
do { 

KeAcquireSpinLockO ; 

A: KeAcquireSpinLock_return() ; 

nPacketsOld = nPackets; 
req = devExt->WLHV; 
if (req && req->status){ 
devExt->WLHV = req->Next; 
KeReleaseSpinLockO ; 

B: KeReleaseSpinLock_return() ; 

irp = req->irp; 
if (req->status > 0){ 

irp->IoS . Status = SUCCESS; 
irp->IoS . Inf o = req->Status 
y else { 

irp->IoS . Status = FAIL; 
irp->IoS . Inf o = req->Status 

} 

SmartDevFreeBlock(req) ; 
loCompleteRequest (irp) ; 
nPackets++; 

} 

}■ while (nPackets ! =nPackets01d) ; 
KeReleaseSpinLockO ; 

C: KeReleaseSpinLock_return() ; 

(b) Program P' 



Fig. 2. (a) A snippet of device driver code P, and (b) program P' resulting from 
instrumentation of program P due to Slic specification in Figure 



specification in Figure Ha)). Calls to the appropriate Slic C functions (see 
Figure Q])b)) have been introduced (at labels A, B, and C). 

The question we wish to answer is: is the label SLIC_ERROR reachable in 
the program P' comprised of the code from Figure Q(b) and Figure □;b)? The 
first step of the algorithm is to generate the initial boolean program. A boolean 
program is a C program in which the only type is boolean. 

For our example, the inital set of predicates Eq consists of two global predi- 
cates (state = Locked) and (state = Unlocked) that appear in the conditionals 
of the Slic specification. These two predicates and the program P' are input 
to the C2bp (C to Boolean Program) tool. The translation of the Slic C code 
from Figure ^b) to the boolean program is shown in Figure El The translation 
of the example procedure is shown in Figure 2])a). Together, these two pieces of 
code comprise the boolean program BV(P' , Eq) output by C2bp. 
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decl -Lstate==Locked> , 

{state==Unlocked} := F,T; 




void slic_abort() begin 




SLIC_ERROR: skip; 


void KeReleaseSpinLock_return() 


end 


begin 

if ({state == Unlocked}) 


void KeAcquireSpinLock_return() 


slic_abort 0 ; 


begin 


else 


if ({state==Locked}) 


{state==Locked} , 


slic_abort () ; 


{state==Unlocked} := F,T; 


else 

{state==Locked} , 
{state==Unlocked}- := T,F; 

end 


end 



Fig. 3. The C code of the Slic specification from Figure 0b) compiled by C2 bp 
into a boolean program. 



As shown in Figure 0 the translation of the Slic C code results in the 
global boolean variables, {state==Locked} and {state==Unlocked}QFor every 
statement s in the C program and predicate e € Eg, the C2bp tool determines 
the effect of statement s on predicate e and codes that effect in the boolean 
program. Non-determinism is used to conservatively model the conditions in 
the C program that cannot be abstracted precisely using the predicates in Eg, 
as shown in Figure 0a). Many of the assignment statements in the example 
procedure are abstracted to the skip statement (no-op) in the boolean program. 
The C2bp tool uses an alias analysis to determine whether or not an assignment 
statement through a pointer dereference can affect a predicate e. 

The second step of our process is to determine whether or not the label 
SLIC_ERROR is reachable in the boolean program BV{P', Eg). We use the Bebop 
model checker to determine the answer to this query. In this case, the answer is 
“yes” and Bebop produces a (shortest) path pg leading to SLIC_ERROR (specified 
by the sequence of labels [A , A, SLIC_ERROR]). 

Does pg represent a feasible execution path of P'l The Newton tool takes a 
C program and a (potential) error path as an input. It uses verification condition 
generation to determine if the path is feasible. If the path is feasible, we have 
found a real error in P' . If the answer is “no” then Newton uses a new algorithm 
to identify a small set of predicates that “explain” why the path is infeasible. 
In the running example, Newton detects that the path pg is infeasible, and 
returns a single predicate {nPackets = npacketsOld) as the explanation for the 
infeasibility. 

Figure0b) shows the boolean program BP{P' ,Ei) that C2bp produces on 
the second iteration of the process. This program has one additional boolean 

^ Boolean programs permit a variable identifier to be an arbitrary string enclosed 
between and “}”. 



264 Thomas Ball and Sriram K. Rajamani 



void example 0 


void example 0 


begin 


begin 


do 


do 


skip; 


skip; 


A: KeAcquireSpinLock_return() ; 


A: KeAcquireSpinLock_return() ; 


skip; 


b := T; 


if (*) then 


if (*) then 


skip; 


skip; 


B: KeReleaseSpinLock_return() ; 


B: KeReleaseSpinLock_return() ; 


skip; 


skip; 


if (*) then 


if (*) then 


skip; 


skip; 


else 


else 


skip; 


skip; 


fi 


fi 


skip; 


b := b ? F : 


fi 


fi 


while (*) ; 


while ( !b) ; 


skip; 


skip; 


C: KeReleaseSpinLock_return() ; 


C: KeReleaseSpinLock_return() ; 


end 


end 


(a) Boolean program BP{P',Eq) 


(b) Boolean program BP{P',E\) 



Fig. 4. The two boolean programs created while checking the code from Fig- 
ure |2Ib). 



variable (b) that represents the predicate (nPackets = nPacketsOld). The as- 
signment statement nPackets = nPacketsOld; makes this condition true, so in 
the boolean program the assignment b := T; represents this assignment. Using 
a theorem prover, C2bp determines that if the predicate is true before the state- 
ment nPackets++, then it is false afterwards. This is captured by the assignment 
statement in the boolean program “b : = b ? F ; . Applying Bebop to the 

new boolean program shows that the label SLIC_ERROR is not reachable. 
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Abstract. Bytecode verification is a crucial security component for Java 
applets, on the Web and on embedded devices such as smart cards. This 
paper describes the main bytecode verification algorithms and surveys 
the variety of formal methods that have been applied to bytecode verifi- 
cation in order to establish its correctness. 



1 Introduction 

Web applets have popularized the idea of downloading and executing untrusted 
compiled code on the personal computer running the Web browser, without 
user’s approval or intervention. Obviously, this raises major security issues: with- 
out appropriate security measures, a malicious applet could mount a variety of 
attacks against the local computer, such as destroying data (e.g. reformatting the 
disk), modifying sensitive data (e.g. registering a bank transfer via the Quicken 
home-banking software U), divulging personal information over the network, or 
modifying other programs (Trojan attacks). 

To make things worse, the applet model is now being transferred to high- 
security embedded devices such as smart cards: the Java Card architecture 0 
allows for post-issuance downloading of applets on smart cards in sensitive ap- 
plication areas such as payment and mobile telephony. This raises the stake 
enormously: a security hole that allows a malicious applet to crash Windows 
is perhaps tolerable, but is certainly not acceptable if it allows the applet to 
perform non-authorized credit card transactions. 

The solution put forward by the Java programming environment is to execute 
the applets in a so-called “sandbox” , which is an insulation layer preventing di- 
rect access to the hardware resources and implementing a suitable access control 
policy msm . The security of the sandbox model relies on the following three 
components: 

1. Applets are not compiled down to machine executable code, but rather to 
bytecode for a virtual machine. The virtual machine manipulates higher- 
level, more secure abstractions of data than the hardware processor, such as 
object references instead of memory addresses. 

2. Applets are not given direct access to hardware resources such as the se- 
rial port, but only to a carefully designed set of API classes and methods 
that perform suitable access control before performing interactions with the 
outside world on behalf of the applet. 
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3. Upon downloading, the bytecode of the applet is subject to a static analysis 
called bytecode verification, whose purpose is to make sure that the code 
of the applet is well typed and does not attempt to bypass protections 1 
and 2 above by performing ill-typed operations at run-time, such as forging 
object references from integers, illegal casting of an object reference from 
one class to another, calling directly private methods of the API, jumping in 
the middle of an API method, or jumping to data as if it were code 

Thus, bytecode verification is a crucial security component in the Java “sand- 
box” model: any bug in the verifier causing an ill- typed applet to be accepted 
can potentially enable a security attack. At the same time, bytecode verification 
is a complex process involving elaborate program analyses. Consequently, con- 
siderable research efforts have been expended to specify the goals of bytecode 
verification, formalize bytecode verification algorithms, and prove their correct- 
ness. 

The purpose of the present paper is to survey briefly this formal work on 
bytecode verification. We explain what bytecode verification is, survey the var- 
ious algorithms that have been proposed, outline the main problems they are 
faced with, and give references to formal proofs of correctness. The thesis of this 
paper is that bytecode verification can be (and has been) attacked from many 
different angles, including dataflow analyses, abstract interpretation, type sys- 
tems, model checking, and machine-checked proofs; thus, bytecode verification 
provides an interesting playground for applying and relating various techniques 
in computed-aided verification and formal methods in computing. 

The remainder of this paper is organized as follows. Section 2 gives a quick 
overview of the Java virtual machine and of bytecode verification. Section 3 
presents the basic bytecode verification algorithm based on dataflow analysis. 
Sections 4 and 5 concentrate on two delicate verification issues: checking ob- 
ject initialization and dealing with JVM subroutines. Section 6 presents a more 
abstract view of bytecode verification as model checking of an abstract interpre- 
tation. Some issues specific to low-resources embedded systems are discussed in 
section 7, followed by conclusions and perspectives in section 8. 

2 Overview of the JVM and of Bytecode Verification 

The Java Virtual Machine (JVM) is a conventional stack-based abstract 
machine. Most instructions pop their arguments off the stack, and push back 
their results on the stack. In addition, a set of registers (also called local vari- 
ables) is provided; they can be accessed via “load” and “store” instructions that 
push the value of a given register on the stack or store the top of the stack in 
the given register, respectively. While the architecture does not mandate it, most 
Java compilers use registers to store the values of source-level local variables and 
method parameters, and the stack to hold temporary results during evaluation 
of expressions. Both the stack and the registers are part of the activation record 
for a method. Thus, they are preserved across method calls. The entry point for 
a method specifies the number of registers and stack slots used by the method. 
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thus allowing an activation record of the right size to be allocated on method 
entry. 

Control is handled by a variety of intra-method branch instructions: uncon- 
ditional branch (“goto”), conditional branches (“branch if top of stack is 0”), 
multi-way branches (corresponding to the switch Java construct). Exception 
handlers can be specified as a table of {pci,pc 2 ,C,h) quadruples, meaning that 
if an exception of class C or a subclass of C is raised by any instruction between 
locations pc\ and pc 2 , control is transferred to the instruction at h (the exception 
handler) . 

About 200 instructions are supported, including arithmetic operations, com- 
parisons, object creation, field accesses and method invocations. The example in 
Fig[D should give the general flavor of JVM bytecode. 



Source Java code: 



static int factorial (int n) 

{ 

int res; 

for (res = 1; n > 0; n — ) res = res * n; 
return res; 



Corresponding JVM bytecode: 



method static int f actorial(int) , 2 registers, 2 



1 (res) 



0 


iconst_l 


// 


1 


istore_l 


// 


2 


iload_0 


// 


3 


ifle 14 


// 


6 


iload_l 


// 


7 


iload_0 


// 


8 


imul 


// 


9 


istore_l 


// 


10 


line 0, -1 


// 


11 


goto 2 


// 


14 


iload_l 


// 


15 


ireturn 


// 



stack slots 
1 

res variable) 



Fig. 1. An Example of JVM Bytecode. 



An important feature of the JVM is that most instructions are typed. For 
instance, the iadd instruction (integer addition) requires that the stack initially 
contains at least two elements, and that these two elements are of type int; it 
then pushes back a result of type int. Similarly, a getfield C.f.r instruction 
(access the instance held / of type r declared in class C) requires that the top of 
the stack contains a reference to an instance of class C or one of its sub-classes 
(and not, for instance, an integer - this would correspond to an attempt to forge 
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an object reference by an unsafe cast); it then pops it and pushes back a value 
of type T (the value of the field /). More generally, proper operation of the JVM 
is not guaranteed unless the code meets the following conditions: 

— Type correctness: the arguments of an instruction are always of the types 
expected by the instruction. 

— No stack overflow or underflow: an instruction never pops an argument off 
an empty stack, nor pushes a result on a full stack (whose size is equal to 
the maximal stack size declared for the method). 

— Code containment: the program counter must always point within the code 
for the method, to the beginning of a valid instruction encoding (no falling 
off the end of the method code; no branches into the middle of an instruction 
encoding) . 

— Register initialization: a load from a register must always follow at least one 
store in this register; in other terms, registers that do not correspond to 
method parameters are not initialized on method entrance, and it is an error 
to load from an uninitialized register. 

— Object initialization: when an instance of a class C is created, one of the 
initialization methods for class C (corresponding to the constructors for this 
class) must be invoked before the class instance can be used. 

— Access control: method invocations, field accesses and class references must 
respect the visibility modifiers (private, protected, public, etc) of the 
method, field or class. 

One way to guarantee these conditions is to check them dynamically, while 
executing the bytecode. This is called the “defensive JVM approach” in the liter- 
ature 1^. However, checking these conditions at run-time is expensive and slows 
down execution significantly. The purpose of bytecode verification is to check 
these conditions once and for all, by static analysis of the bytecode at loading- 
time. Bytecode that passes verification can then be executed at full speed, with- 
out extra dynamic checks. 



3 Basic Verification by Datafiow Analysis 

The first JVM bytecode verification algorithm is due to Gosling and Yellin at 
Sun |9I36I15| . Almost all existing bytecode verifiers implement this algorithm. 
It can be summarized as a datafiow analysis applied to a type-level abstract 
interpretation of the virtual machine. Some advanced aspects of the algorithm 
that go beyond standard datafiow analysis are described in sections EJ and 0 In 
this section, we describe the basic ingredients of this algorithm: the type-level 
abstract interpreter and the datafiow framework. 

3.1 The Type-Level Abstract Interpreter 

At the heart of all bytecode verification algorithms described in this paper is an 
abstract interpreter for the JVM instruction set that executes JVM instructions 
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like a defensive JVM (including type tests, stack underflow and overflow tests, 
etc), but operates over types instead of values. That is, the abstract interpreter 
manipulates a stack of types and a register type (an array associating types 
to register numbers). It simulates the execution of instructions at the level of 
types. For instance, for the iadd instruction (integer addition), it checks that 
the stack of types contains at least two elements, and that the top two elements 
are the type int. It then pops the top two elements and pushes back the type 
int corresponding to the result of the addition. 



iconst n : {S, R) (int. S', R) if |S| < Mstack 
iadd : (int. int. S, R) (int.S, R) 
iload n : (S, R) — > (int.S, R) 

if 0 < n < Mreg and R(n) = int and |S| < Mstack 
istore n : (int.S, R) (S, R{n <— int}) if 0 < n < Mreg 
aconst_null : (S, R) — > (null.S, R) if |S| < Mstack 
aload n : (S, R) — > {R{n).S, R) 

if 0 < n < Mreg and R(n) <: Object and |S| < Mstack 
astore n : (r.S, R) (S, R{n ^ r}) if 0 < n < Mreg and r <: Object 
getfield C./.r : (ref (D).S, R) (r.S, R) if D <: C 

invokestatic C.m.a : (r), . . R) {t.S, R) 

if a — r(ri, . . . , r„) and r/ <: Tt for i = 1 . . . n 



Fig. 2. Selected rules for the type-level abstract interpreter. Mstack is the max- 
imal stack size and Mreg the maximal number of registers. 



Figure Qdeflnes more formally the abstract interpreter on a number of repre- 
sentative JVM instructions. The abstract interpreter is presented as a transition 
relation i : {S, R) — > {S', R'), where i is the instruction, S and R the stack type 
and register type before executing the instruction, and S' and R' the stack type 
and register type after executing the instruction. Errors such as type mismatches 
on the arguments, stack underflow, or stack overflow, are denoted by the absence 
of a transition. For instance, there is no transition on iadd from an empty stack. 

Notice that method invocations (such as the invokestatic instruction in 
Figia are not treated by branching to the code of the invoked method, like the 
concrete JVM does, but simply assume that the effect of the method invocation 
on the stack is as described by the method signature given in the “invoke” in- 
struction. All bytecode verification algorithms described in this paper proceed 
method per method, assuming that all other methods are well-typed when veri- 
fying the code of a method. A simple coinductive argument shows that if this is 
the case, the program as a whole (the collection of all methods) is well typed. 

The types manipulated by the abstract interpreter are similar to the source- 
level types of the Java language. They include primitive types (int, long, float, 
double), object reference types represented by the fully qualified names of the 
corresponding classes, and array types. The boolean, byte, short and char 
types of Java are identified with int. Two extra types are introduced: null to 
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represent the type of the null reference, and T to represent the contents of unini- 
tialized registers, that is, any value. (“Load” instructions explicitly check that 
the accessed register does not have type T, thus detecting accesses to uninitial- 
ized registers.) A subtyping relation between these types, similar to that of the 
Java language (the “assignment compatibility” relation), is defined as shown in 

Fig0 




Fig. 3. Type expressions used by the verifier, with their subtyping relation. C, 
D, E are user-defined classes, with D and E extending C. Not all types are shown. 



3.2 The Dataflow Analysis 

Verifying a method whose body is a straight-line piece of code (no branches) is 
easy: we simply iterate the transition function of the abstract interpreter over 
the instructions, taking the stack type and register type “after” the preceding 
instruction as the stack type and register type “before” the next instruction. The 
initial stack and register types reflect the state of the JVM on method entrance: 
the stack type is empty; the types of the registers 0 ... n — 1 corresponding to 
the n method parameters are set to the types of the corresponding parameters 
in the method signature; the other registers n . . . M^eg — 1 corresponding to 
uninitialized local variables are given the type T. 

If the abstract interpreter gets “stuck”, i.e. cannot make a transition from 
one of the intermediate states, then verification fails and the code is rejected. 
Otherwise, verification succeeds, and since the abstract interpreter is a correct 
approximation of a defensive JVM, we are certain that a defensive JVM will 
not get stuck either executing the code. Thus, the code is correct and can be 
executed safely by a regular, non-defensive JVM. 

Branches and exception handlers introduce forks and joins in the control flow 
of the method. Thus, an instruction can have several predecessors, with different 
stack and register types “after” these predecessor instructions. Sun’s bytecode 
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verifier deals with this situation in the manner customary for data flow analysis: 
the state (stack type and register type) “before” an instruction is taken to be 
the least upper bound of the states “after” all predecessors of this instruction. 
For instance, assume classes C\ and C 2 extend C, and we analyze a conditional 
construct that stores a value of type Ci in register 0 in one arm, and a value of 
type C 2 in the other arm. (See Fig0) When the two arms meet, register 0 is 
assumed to have type C, which is the least upper bound (the smallest common 
supertype) of C\ and C 2 . 



ro : Cl 



ro : C2 



ro-.C = lub{Ci,C2) 



Fig. 4. Handling Joins in the Control Flow. 



More precisely, writing in{i) for the state “before” instruction i and out{i) 
for the state “after” i, the algorithm sets up the following dataflow equations: 

i : in{i) — > out{i) 

in{i) = lub{out{j) \ j predecessor of i} 
for every instruction i, plus 



in{io) = {e,{Po,...,Pn-i,T,...,T)) 

for the start instruction zq (the Pk are the types of the method parameters). 
These equations are then solved by standard fixpoint iteration using Kildall’s 
worklist algorithm (El section 8.4]: an instruction i is taken from the worklist 
and its state “after” out{i) is determined from its state “before” in{i) using the 
abstract interpreter; then, we replace in{j) by lub(in(j),out(i)) for each succes- 
sor j of z, and enter those successors j for which zn(j) changed in the worklist. 
The fixpoint is reached when the worklist is empty, in which case verification 
succeeds. Verification fails if a state with no transition is encountered, or one of 
the least upper bounds is undefined. 

As a trivial optimization of the algorithm above, the dataflow equations can 
be set up at the level of extended basic blocks rather than individual instructions. 
In other terms, it suffices to keep in working memory the states zn(z) where z is 
the first instruction of an extended basic block (i.e. a branch target); the other 
states can be recomputed on the fly as needed. 

The least upper bound of two states is taken pointwise, both on the stack 
types and the register types. It is undefined if the stack types have different 
heights, which causes verification to fail. This situation corresponds to a program 
point where the run-time stack can have different heights depending on the path 
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by which the point is reached; such code must be rejected because it can lead to 
unbounded stack height, and therefore to stack overflow. (Consider a loop that 
pushes one more entry on the stack at each iteration.) 

The least upper bound of two register types can be T, causing this register 
to have type T in the merged state. This corresponds to the situation where 
a register holds values of incompatible types in two arms of a conditional (e.g. 
int in one arm and an object reference in the other), and therefore is treated 
as uninitialized (no further loads from this register) after the merge point. The 
least upper bound of two stack slots can also be T, in which case Sun’s algo- 
rithm aborts verification immediately. Alternatively, it is entirely harmless to 
continue verification after setting the stack slot to T in the merged state, since 
the corresponding value cannot be used by any well-typed instruction, but simply 
discarded by instructions such as pop or return. 

3.3 Interfaces and Least Upper Bounds 

The dataflow framework presented above requires that the type algebra, ordered 
by the subtyping relation, constitutes a semi-lattice. That is, every pair of types 
possesses a smallest common supertype (least upper bound). 

Unfortunately, this property does not hold if we take the verifier type alge- 
bra to be the Java source-level type algebra (extended with T and null) and 
the subtyping relation to be the Java source- level assignment compatibility re- 
lation. The problem is that interfaces are types, just like classes, and a class can 
implement several interfaces. Consider the following classes: 

interface I { . . . } 

interface J { . . . } 

class Cl implements I, J { ... } 

class C2 implements I, J { . . . I 

The subtyping relation induced by these declarations is: 



Object 

1 J 




This is obviously not a semi-lattice, since the two types Cl and C2 have two 
common super-types 1 and J that are not comparable (neither is subtype of the 
other). 

There are several ways to address this issue. One approach is to manipulate 
sets of types during verification instead of single types as we described earlier. 
These sets of types are to be interpreted as conjunctive types, i.e. the set {l, j}, 
like the conjunctive type 1 A J, represents values that have both types 1 and 
J, and therefore is a suitable least upper bound for the types {Cl} and |C2} 
in the example above. This is the approach followed by Qian I2nj and also by 
Pusch prj . 
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Another approach is to complete the class and interface hierarchy of the 
program into a lattice before performing verification. In the example above, the 
completion would add a pseudo-interface landJ extending both I and J, and 
claim that Cl and C2 implement landJ rather than I and J. We then obtain the 
following semi-lattice: 




Cl C2 



The pseudo-interface landJ plays the same role as the set type {l,J} in 
the first approach described above. The difference is that the completion of the 
class/interface hierarchy is performed once and for all, and verification manipu- 
lates only simple types rather than sets of types. This keeps verification simple 
and fast. 

The simplest solution to the interface problem is to be found in Sun’s imple- 
mentation of the JDK bytecode verifier. (This approach is documented nowhere, 
but can easily be inferred by experimentation.) Namely, bytecode verification ig- 
nores interfaces, treating all interface types as the class type Object. Thus, the 
type algebra used by the verifier contains only proper classes and no interfaces, 
and subtyping between proper classes is simply the inheritance relation between 
them. Since Java has single inheritance (a class can implement several interfaces, 
but inherit from one class only), the subtyping relation is tree-shaped and triv- 
ially forms a lattice: the least upper bound of two classes is simply their closest 
common ancestor in the inheritance tree. 

The downside of Sun’s approach, compared with the set-based or completion- 
based approach, is that the verifier cannot guarantee statically that an object 
reference implements a given interface. In particular, the invokeinterf ace I .m 
instruction, which invokes method m of interface / on an object, is not guar- 
anteed to receive at run-time an object that actually implements /: the only 
guarantee provided by Sun’s verifier is that it receives an argument of type 
Object, that is, any object reference. The invokeinterf ace I.m instruction 
must therefore check dynamically that the object actually implements I, and 
raise an exception if it does not. 

3.4 Formalizations and Proofs 

Many formalizations and proofs of correctness of Java bytecode verification have 
been published, and we have reasons to believe that many more have been devel- 
oped internally, both in academia and industry. With no claims to exhaustive- 
ness, we will mention the works of Cohen p] and Qian |2S! among the first formal 
specifications of the JVM. Qian’s specification is written in ordinary mathemat- 
ics, while Cohen’s uses the specification language of the ACL2 theorem prover. 
Pusch m uses the Isabelle/HOL prover to formalize the dynamic semantics of 
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a fragment of the JVM, the corresponding type-level abstract interpreter used 
by the verifier, and proves the correctness of the latter with respect to the for- 
mer: if the abstract interpreter can do a transition i : {S,R) — > then 

for all concrete states (s,r) matching (S,R), the concrete interpreter can do a 
transition i : (s,r) — > {s' ,r'), and the final concrete state (s',r') matches (S,R). 
Nipkow Em formalizes the dataflow analysis framework in Isabelle/HOL and 
proves its correctness. 

4 Verifying Object Initialization 

Object creation in the Java virtual machine is a two-step process: first, the 
instruction new C creates a new object, instance of the class C, with all in- 
stance fields filled with default values (0 for numerical fields and null for refer- 
ence fields); second, one of the initializer methods for class C (methods named 
C.<init> resulting from the compilation of the constructor methods of C) must 
be invoked on the newly created object. Initializer methods, just like their source- 
level counterpart (constructors), are typically used to initialize instance fields to 
non-default values, although they can also perform nearly arbitrary computa- 
tions. 

The JVM specification requires that this two-step object initialization pro- 
tocol be respected. That is, the object instance created by the new instruction 
is considered uninitialized, and none of the regular object operations (i.e. store 
the object in a data structure, return it as method result, access one of its fields, 
invoke one of its methods) is allowed on this uninitialized object. Only when one 
of the initializer methods for its class is invoked on the new object and return 
normally is the new object considered fully initialized and usable like any other 
object. 

Unlike the register initialization property, this object initialization property is 
not crucial to ensure type safety at run-time: since the new instruction initializes 
the instance fields of the new object with correct values for their types, type 
safety is not broken if the resulting default-initialized object is used right away 
without having called an initializer method. However, the object initialization 
property is important to ensure that some invariants between instance fields that 
is established by the constructor of a class actually hold for all objects of this 
class. 

Static verification of object initialization is made more complex by the fact 
that initialization methods operate by side-effect: instead of taking an uninitial- 
ized object and returning an initialized object, they simply take an uninitialized 
object, update its fields, and return nothing. Hence, the code generated by Java 
compilers for the source-level statement x = new C(arg) is generally of the fol- 
lowing form: 

new C // create uninitialized instance of C 

dup // duplicate the reference to this instance 

code to compute arg 

invokespecial C.<init> // call the initializer 
astore 3 // store initialized object in x 
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That is, two references to the uninitialized instance of C are held on the stack. 
The topmost reference is “consumed” by the invocation of C.<init>. When 
this initializer returns, the second reference is now at the top of the stack and 
now references a properly initialized object, which is then stored in the register 
allocated to x. The tricky point is that the initializer method is applied to one 
object reference on the stack, but it is another object reference contained in 
the stack (which happens to reference the same object) whose status goes from 
“uninitialized” to “fully initialized” in the process. 

As demonstrated above, static verification of object initialization requires a 
form of alias analysis (more precisely a must-alias analysis) to determine which 
object references in the current state are guaranteed to refer to the same unini- 
tialized object that is passed as argument to an initializer method. While any 
must-alias analysis can be used. Sun’s verifier uses a fairly simple analysis, 
whereas an uninitialized object is identified by the position (program counter 
value) of the new instruction that created it. More precisely, the type algebra is 
enriched by the types Cp denoting an uninitialized instance of class C created 
by a new instruction at PC p. An invocation of an initializer method C.<init> 
checks that the first argument of the method is of type Cp for some p, then pops 
the arguments off the stack type as usual, and finally finds all other occurrences 
of the type Cp in the abstract interpreter state (stack type and register types) 
and replaces them by C. The following example shows how this works for a nested 
initialization corresponding to the Java expression new C(new C(null)): 



0 


new C // stack type 


after : 


Co 








3 


dup 


// 


Co . 


Co 






4 


new C 


// 


Co ) 


Co ) 


C4 




7 


dup 


// 


Co . 


Co . 


C 4 , 


C 4 


8 


aconst_null 


// 


Co . 


Co . 


C 4 , 


C 4 , null 


9 


invokespecial C.<init> 


// 


Co ) 


Co ) 


c 




12 


invokespecial C.<init> 


// 


C 









15: ... 

In particular, the first invokespecial initializes only the instance created at 
PC 4, but not the one created at PC 0. 

This approach is correct only if at any given time, the machine state contains 
at most one uninitialized object created at a given PC. Loops containing a new 
instruction can invalidate this assumption, since several distinct objects created 
by this new instruction can be “in flight” , yet are given the same uninitialized 
object type (same class, same PC of creation). To avoid this problem. Sun’s 
verifier requires that no uninitialized object type appear in the machine state 
when a backward branch is taken. Since a control- flow loop must take at least 
one backward branch, this guarantees that no initialized objects can be carried 
over from one loop iteration to the next one, thus ensuring the correctness of 
the “PC of creation” aliasing criterion. 

Freund and Mitchell 0 formalize this approach to verifying object initializa- 
tion. Bertot |2j proves the correctness of this approach using the Coq theorem 
prover, and extracts a verification algorithm from the proof. 
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5 Subroutines 

Subroutines in the JVM are code fragments that can be called from several points 
inside the code of a method. To this end, the JVM provides two instructions: 
j sr branches to a given label in the method code and pushes a return address 
to the following instruction; ret recovers a return address (from a register) 
and branches to the corresponding instruction. Subroutines are used to compile 
certain exception handling constructs, and can also be used as a general code- 
sharing device. The difference between a subroutine call and a method invocation 
is that the body of the subroutine executes in the same activation record than 
its caller, and therefore can access and modify the registers of the caller. 

5.1 The Verification Problem with Subroutines 

Subroutines complicate significantly bytecode verification by dataflow analysis. 
First, it is not obvious to determine the successors of a ret instruction, since 
the return address is a first-class value. As a first approximation, we can say 
that a ret instruction can branch to any instruction that follows a jsr in the 
method code. (This approximation is too coarse in practice; we will describe 
better approximations later.) Second, the subroutine entry point acts as a merge 
point in the control-flow graph, causing the register types at the points of call 
to this subroutine to be merged. This can lead to excessive loss of precision in 
the register types inferred, as the example in FigO shows. 



// 



0 


jsr 100 


// 


3 






50 


iconst_0 




51 


istore_0 


// 


52 


jsr 100 


// 


55 


iload_0 


// 


56 


ireturn 


// 



// 



100: 


astore_l 


// 


101: 




// 


110: 


ret 1 


// 



register 0 uninitialized here 
call subroutine at 100 



register 0 has type ”int” here 
call subroutine at 100 
load integer from register 0 
and return to caller 

subroutine at 100: 

store return address in register 1 

execute some code that does not use register 0 

return to caller 



Fig. 5. An Example of Subroutine 



The two jsr 100 at 0 and 52 have 100 as successor. At 0, register 0 has type 
T; at 52, it has type int. Thus, at 100, register 0 has type T (the least upper 
bound of T and int). The subroutine body (between 101 and 110) does not 
modify register 0, hence its type at 110 is still T. The ret 1 at 110 has 3 and 
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55 as successors (the two instructions following the two jsr 100). Thus, at 55, 
register 0 has type T and cannot be used as an integer by instructions 55 and 
56. This code is therefore rejected. 

This behavior is counter-intuitive. Calling a subroutine that does not use a 
given register does not modify the run-time value of this register, so one could 
expect that it does not modify the verification-time type of this register either. 
Indeed, if the subroutine body was expanded inline at the two jsr sites, bytecode 
verification would succeed as expected. 

The subroutine-based compilation scheme for the try. . . finally construct 
produces code very much like the above, with a register being uninitialized at 
one call site of the subroutine and holding a value preserved by the subroutine at 
another call site. Hence it is crucial that similar code passes bytecode verification. 
We will now see two refinements of the dataflow-based verification algorithm that 
achieve this goal. 

5.2 Sun’s Solution 

We first describe the approach implemented in Sun’s JDK verifier. It is described 
informally in uni section 4.9.6], and formalized in |29I25^ . This approach imple- 
ments the intuition that a call to a subroutine should not change the types of 
registers that are not used in the subroutine body. 

First, we need to make precise what a “subroutine body” is: since JVM 
bytecode is unstructured, subroutines are not syntactically delimited in the code; 
subroutine entry points are easily detected (as targets of j sr instructions), but it 
is not immediately apparent which instructions can be reached from a subroutine 
entry point. Thus, a dataflow analysis is performed, either before or in parallel 
with the main type analysis. The outcome of this analysis is a consistent labeling 
of every instruction by the entry point(s) for the subroutine(s) it logically belongs 
to. From this labeling, we can then determine, for each subroutine entry point 
the return instruction Ret{£) for the subroutine, and the set of registers Used{£) 
that are read or written by instructions belonging to that subroutine. 

The dataflow equation for subroutine calls is then as follows. Let i be 
an instruction jsr £, and j be the instruction immediately following i. Let 
{Sjsr, Rjsr) = out{i) be the state “after” the jsr, and (Sret,Rret) = out{Ret{£)) 
be the state “after” the ret that terminates the subroutine. Then: 



In other terms, the state “before” the instruction j following the j sr is identical 
to the state “after” the ret, except for the types of the registers that are not 
used by the subroutine, which are taken from the state “after” the jsr. 

In the example above, we have i?et(100) = 110 and register 0 is not in 
Used{100). Hence the type of register 0 before instruction 55 (the instruction 
following the jsr) is equal to the type after instruction 52 (the jsr itself), that 
is int, instead of T (the type of register 0 after the ret 1 at 110). 




Used {£) 
Used {£) 
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While effective in practice, Sun’s approach to subroutine verification raises 
a challenging issue: determining the subroutine structure is difficult. Not only 
subroutines are not syntactically delimited, but return addresses are stored in 
general-purpose registers rather than on a subroutine-specific stack, which makes 
tracking return addresses and matching ret/jsr pairs more difficult. To facili- 
tate the determination of the subroutine structure, the JVM specification states 
a number of restrictions on correct JVM code, such as “two different subroutines 
cannot ‘merge’ their execution to a single ret instruction” m section 4.9.6]. 
These restrictions seem rather ad-hoc and specific to the particular subroutine 
labeling algorithm that Sun’s verifier uses. Moreover, the description of subrou- 
tine labeling given in the JVM specification is very informal and incomplete. 

Several rational reconstructions of this part of Sun’s verifier have been pub- 
lished. The first, due to Abadi and Stata PHI, is presented as a non-standard 
type system, and determines the subroutine structure before checking the types. 
The second is due to Qian m and infers simultaneously the types and the 
subroutine structure, in a way that is closer to Sun’s implementation. The si- 
multaneous determination of types and Used(i) sets complicates the dataflow 
analysis: the transfer function of the analysis is no longer monotonous, and spe- 
cial iteration strategies are required to reach the fixpoint. Finally, O’Callahan 
m and Hagiya and Tozawa m also give non-standard type systems for sub- 
routines based on continuation types and context-dependent types, respectively. 
However, these papers give only type checking rules, but no effective verification 
(type inference) algorithms. 

While these works shed considerable light on the issue, they are carried in 
the context of a small subset of the JVM that excludes exceptions and object 
initialization in particular. Delicate interactions between subroutines and object 
initialization were discovered later by Freund and Mitchell , exposing a bug in 
Sun’s verifier. As for exceptions, exception handling complicates significantly the 
determination of the subroutine structure. Examination of bytecode produced by 
Java compiler show two possible situations: either an exception handler covers a 
range of instructions entirely contained in a subroutine, in which case the code 
of the exception handler should be considered as part of the same subroutine 
(e.g. it can branch back to the ret instruction that terminates the subroutine); 
or, an exception handler covers both instructions belonging to a subroutine and 
non-subroutine instructions, in which case the code of the handler should be 
considered as outside the subroutine. The problem is that in the second case, we 
have a branch (via the exception handler) from a subroutine instruction to a non- 
subroutine instruction, and this branch is not a ret instruction; this situation 
is not allowed in Abadi and Stata’s subroutine labeling system. 

5.3 Poly variant Dataflow Analysis 

An alternate solution to the subroutine problem, used in the Java Card off- 
card verifier relies on a polyvariant dataflow analysis: instructions inside 
subroutine bodies are analyzed several times, once per call site for the subroutine. 
The principles of polyvariant flow analyses, also called context-sensitive analyses. 
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are well known d section 3.6]: whereas monovariant analyses maintain only 
one state per program point, a polyvariant analysis allows several states per 
program point. These states are indexed by contours that usually approximate 
the control-flow path that led to each state. 

In the case of bytecode verification, contours are subroutine call stacks: lists 
of return addresses for the jsr instructions that led to the corresponding state. 
In the absence of subroutines, all the bytecode for a method is analyzed in 
the empty contour. Thus, only one state is associated to each instruction and 
the analysis degenerates into the monovariant dataflow analysis of section Id. 21 
However, when a jsr instruction is encountered in the current contour c, it 
is treated as a branch to the instruction at i in the augmented contour £.c. 
Similarly, a ret r instruction is treated as a branch that restricts the current 
context c by popping one or several return addresses from c (as determined by 
the type of the register r) . 

In the example of FigEl the two jsr 100 instructions are analyzed in the 
empty context e. This causes two “in” states to be associated with the instruc- 
tion at 100; one has contour 3.e, assigns type T to register 0, and contains 
retaddr(3) at the top of the staclfl; the other state has contour 55. e, assigns 
type int to register 0, and contains retaddr(55) at the top of the stack. Then, 
the instructions at 101.. .110 are analyzed twice, in the two contours 3.e and 
55. e. In the contour 3.e, the ret 1 at 110 is treated as a branch to 3, where 
register 0 still has type T. In the contour 55. e, the ret 1 is treated as a branch 
to 55 with register 0 still having type int. By analyzing the subroutine body in 
a polyvariant way, under two different contours, we avoided merging the types 
T and int of register 0 at the subroutine entry point, and thus obtained the 
desired type propagation behavior for register 0: T before and after the jsr 100 
at 3, but int before and after the jsr 100 at 52. 

More formally, the polyvariant dataflow equation for a j sr £ instruction at i 
followed by an instruction at j is 

= (retaddr(j).S', T) where {S,T) = out{i,c) 

For a ret r instruction at i, the equation is 

in{ra,c) = out{i,c) 

where the type of register r in the state out{i, c) is retaddr(ra) and the context 
c' is obtained from c by popping return addresses until ra is found, that is, 
c = c" .ra.c' . 

Another way to view polyvariant verification is that it is exactly equivalent 
to performing monovariant verification on an expanded version of the bytecode 
where every subroutine call has been replaced by a distinct copy of the subrou- 
tine body. Instead of actually taking N copies of the subroutine body, we analyze 
them N times in N different contours. Of course, duplicating subroutine bod- 
ies before the monovariant verification is not practical, because it requires prior 



^ The type retaddr(i) represents a return address to the instruction at i. 
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knowledge of the subroutine structure (to determine which instructions are part 
of which subroutine body), and as shown in section f5.2[ the subroutine struc- 
ture is hard to determine exactly. The beauty of the polyvariant analysis is that 
it determines the subroutine structure along the way, via the computations on 
contours performed during the dataflow analysis. Moreover, this determination 
takes advantage of typing information such as the retaddr(ro) types to deter- 
mine with certainty the point to which a ret instruction branches in case of 
early return from nested subroutines. 

Another advantage of polyvariant verification is that it has no problem deal- 
ing with code that is reachable both from subroutine bodies and from the main 
program, such as the exception handlers mentioned at the end of section 15.21 
rather than deciding whether such exception handlers are part of a subroutine 
or not, the poly variant analysis simply analyzes them several times, once in the 
empty contour and once or several times in subroutine contours. 

The downside of polyvariant verification is that it is more computationally 
expensive than Sun’s approach. In particular, if subroutines are nested to depth 
N, and each subroutine is called k times, the instructions from the innermost sub- 
routine are analyzed times instead of only once in Sun’s algorithm. However, 
typical Java code has low nesting of subroutines: most methods have N < 1, very 
few have TV = 2, and N > 2 is unheard of. Hence, the extra cost of polyvariant 
verification is entirely acceptable in practice. 

6 Model Checking of Abstract Interpretations 

It is folk lore that dataflow analyses can be viewed as model checking of abstract 
interpretations m- Since a large part of bytecode verification is obviously an 
abstract interpretation (of a defensive JVM at the type level), it is natural to 
look at the remaining parts from a model-checking perspective. 

Posegga and Vogt 1221 were the first to do so. They outline an algorithm that 
takes the bytecode for a method and generates a temporal logic formula that 
holds if and only if the bytecode is safe. They then use an off-the-shelf model 
checker to determine the validity of the formula. While this application uses 
only a small part of the power and generality of temporal logic and of the model 
checker, the approach sounds interesting for establishing finer properties of the 
bytecode that go beyond the basic safety properties of bytecode verification (see 
section |S1) . 

Unpublished work by Brisset 0 extracts the essence of Posegga and Vogt’s 
approach: the idea of exploring all reachable states of the abstract interpreter. 
Brisset considers the transition relation obtained by combining the transition 
relation of the type-level abstract interpreter (FigO) with the “successor” rela- 
tion between instructions. This relation is of the form (p,S,R) {p\ S' , R'), 

meaning that the abstract interpreter, started at PC p with stack type S and 
register type R, can abstractly execute the instruction at p and arrive at PC p' 
with stack type S' and register type R' . 

Starting with the initial state (0, e, {Pq, . . . , P„_i, T, . . . , T)) corresponding 
to the method entry, we can then explore all states reachable by repeated ap- 
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plications of the transition function. If we encounter a state where the abstract 
interpreter is “stuck” (cannot make a transition because some check failed), 
verification fails and the bytecode is rejected. Otherwise, the correctness of the 
abstract interpretation guarantees that the concrete, defensive JVM interpreter 
will never get “stuck” either during the execution of the method code, hence the 
bytecode is safe. 

This algorithm always terminates because the number of distinct states is 
finite (albeit large), since there is a finite number of distinct types used in the 
program, and the height of the stack is bounded, and the number of registers is 
fixed. Brisset formalized and proved the correctness of this approach in the Coq 
proof assistant, and extracted the ML code of a bytecode verifier from the proof. 

This approach is conceptually interesting because it is the ultimate polyvari- 
ant analysis: rather than having one stack-register type per control point (as in 
Sun’s verifier), or one such type per control point and per subroutine contour 
(as in section we can have arbitrarily many stack-register types per control 
point, depending on the number of control- flow paths that lead to this control 
point. Consider for instance the control-flow joint depicted in Fig 2] While the 
dataflow-based algorithms verify the instructions following the join point only 
once under the assumption r : lub{Ci, C 2 ) = C, Brisset’s algorithm verifies them 
twice, once under the assumption r : C\, once under the assumption r : C 2 . 

In other terms, this analysis is poly variant not only with respect to subroutine 
calls, but to all conditional or iV-way branches as well. This renders the analysis 
impractical, since it runs in time exponential in the number of such branches 
in the method. (Consider a control-flow graph with N conditional constructs in 
sequence, each assigning a different type to registers ri . . .rjv; this causes the 
code following the last conditional to be verified 2^ times under 2^ different 
register types.) 

Of course, the precision of Brisset’s algorithm can be degraded by apply- 
ing widening steps in order to reduce the number of states. Some transitions 
(pc,S,R) {pc', S', R') can be replaced by {pc,S,R) — > {pc' ,S" ,R") where 
R' <: R" and S' <: S" . If the abstract interpreter is still not stuck on any of 
the reachable states, the bytecode remains safe. The monovariant dataflow anal- 
ysis of section B.2I corresponds to keeping only one state per program point by 
replacing multiple states by their least upper bounds. The polyvariant dataflow 
analysis of section lt).,''il is similar, except that the merging of states into least 
upper bounds is relaxed for subroutines and controlled via contours. 

Another interest of Brisset’s approach is that it allows us to reconsider some 
of the design decisions explained in sections and 0 For instance, Brisset’s 
algorithm never computes least upper bounds of types, but simply checks sub- 
typing relations between types. Thus, it can be applied to any subtyping relation, 
not just relations that form a semi-lattice. Indeed, it can keep track of interface 
types and verify invokeinterf ace instructions accurately, without having to 
deal with sets of types or lattice completion. 
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7 Bytecode Verification on Small Computers 

Java virtual machines run not only in personal computers and workstations, but 
also in a variety of embedded computers, such as personal digital assistants, mo- 
bile phones, and smart cards. Extending the Java model of safe post-issuance 
code downloading to these devices requires that bytecode verification be per- 
formed on the embedded system itself. However, bytecode verification is an ex- 
pensive process that exceeds the resources (processing power and memory space) 
of small embedded systems. For instance, a typical Java card (Java-enabled smart 
card) has 1 or 2 kilo-hytes of RAM and an 8-bit microprocessor that is approx- 
imately 1000 times slower than a personal computer. Fitting a bytecode verifier 
into one of these devices requires new verification algorithms, which we discuss 
now. 

7.1 Lightweight Bytecode Verification Using Certificates 

Inspired by Necula and Lee’s proof-carrying code HH!, Rose and Rose 123 pro- 
pose to split bytecode verification into two phases: the code producer computes 
the stack and register types at branch targets and transmit these so-called cer- 
tificates along with the bytecode; the embedded system, then, simply checks 
that the code is well-typed with respect to the types given in the certificates, 
rather than inferring these types itself. In other terms, the embedded system no 
longer solves iteratively the dataflow equations characterizing correct bytecode, 
but simply checks that the types provided in the code certificates are indeed a 
solution of these equations. 

The benefits of this approach are twofold. First, checking a solution is faster 
than inferring one, since we avoid the cost of the fixpoint iteration. This speeds 
up verification to some exteniQ. Second, certificates are only read, but never 
modified during verification. Hence, they can be stored in persistent rewritable 
memory (EEPROM or Flash). Smart card-class embedded systems offer rela- 
tively large amounts of persistent memory (e.g. 16-32 kilo-bytes). Writing data 
to such memory is slow (1000-10000 times slower than reading from it), hence it 
is not possible to store there rapidly-changing data such as the fixpoint computed 
by a standard verification algorithm. However, Rose and Rose’s certificates are 
written only once, on reception of the bytecode, and only read during verifica- 
tion, so they can fit in the “comfortable” EEPROM memory space. 

There are two limitations to this approach. First, it is currently not known 
how to deal with subroutines in this framework. Indeed, Sun proposed to drop 
subroutines entirely in order to use Rose and Rose’s bytecode verification algo- 
rithm in the KVM, one of Sun’s embedded variants of the JVM m- Second, 
certificates are relatively large: without compression, about the same size as 
the code they annotate; with compression, about 20% of the code size. Even if 
certificates are stored in persistent memory, they can still exceed the available 
memory space. 

^ The speedup is not as important as one might expect, since experiments show that 
the fixpoint is usually reached after examining every instruction at most twice m 
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7.2 On-Card Verification with Off-Card Code Transformation 

The Java Card bytecode verifier described in m attacks the memory prob- 
lem from another angle. Like the standard bytecode verifier, it solves dataflow 
equations using fixpoint iteration. To reduce memory requirements, however, it 
has only one global register type that is shared between all control points in 
the method. In other terms, the solution it infers is such that a given register 
has the same type throughout the method. For similar reasons, it also requires 
that the stack be empty at each branch instruction and at each branch target 
instruction. With these extra restrictions, bytecode verification can be done in 
space O {M stack + Mreg), instead of 0{Nbranch X {M stack + Mr eg)) for Sun’s algo- 
rithm, where N^ranch is the number of branch targets. In practice, the memory 
requirements are small enough that all data structures comfortably fit in RAM 
on a smart card. 

One drawback of this approach is that register initialization can no longer be 
checked statically, and must be replaced by run-time initialization of registers 
to safe values (0 or null) on method entrance. Another drawback is that the 
extra restrictions imposed by the on-card verifier cause perfectly legal bytecode 
(that passes Sun’s verifier) to be rejected. To address the latter issue, we rely 
on an off-card transformation, performed on the bytecode of the applet, that 
transforms any legal bytecode (that passes Sun’s verifier) into equivalent byte- 
code that passes the on-card verifier. The off-card transformations include stack 
normalizations around branches and register reallocation by graph coloring, and 
increase the size of the code by less than 2% jEl . 

8 Conclusions and Perspectives 

Java bytecode verification is now a well researched technique, although it is still 
defined only by Sun’s reference implementation: all the formal works reviewed 
in this paper have not yet resulted in a complete formal specification of what it 
is and what it guarantees. 

A largely open question is whether bytecode verification can go beyond basic 
type safety and initialization properties, and statically establish more advanced 
properties of applets, such as resource usage (bounding the amount of memory 
allocated) and reactiveness (bounding the running time of an applet between 
two interactions with the outside world) . Controlling resource usage is especially 
important for Java Card applets: since Java Card does not guarantee the presence 
of a garbage collector, applets are supposed to allocate all the objects they need 
at installation time, then run in constant space. 

Other properties of interest include access control and information flow. Cur- 
rently, the Java security manager performs all access control checks dynamically. 
Various static analyses and program transformations have been proposed to per- 
form some of these checks statically ISSE31. As for information flow (an applet 
does not “leak” confidential information that it can access), this property is 
essentially impossible to check dynamically; several type systems have been pro- 
posed to enforce it statically 1841X1111111 . 
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Finally, the security of the sandbox model relies not only on bytecode veri- 
fication, but also on the proper implementation of the API given to the applet. 
The majority of known applet-based attacks exploit (in a type-safe way) bugs 
in the API, rather than breaking type safety through bugs in the verifier. Verifi- 
cation of the API is a promising and largely open area of application for formal 
methods peg. 
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Abstract. Regular languages have proved useful for the symbolic state 
exploration of infinite state systems. They can be used to represent infi- 
nite sets of system conhgurations; the transitional semantics of the sys- 
tem consequently can be modeled by hnite-state transducers. A standard 
problem encountered when doing symbolic state exploration for infinite 
state systems is how to explore all states in a finite amount of time. When 
representing the one-step transition relation of a system by a finite-state 
transducer T, this problem boils down to Ending an appropriate finite- 
state representation T* for its transitive closure. 

In this paper we give a partial algorithm to compute a finite-state trans- 
ducer T* for a general class of transducers. The construction builds a 
quotient of an underlying infinite-state transducer using a novel be- 
havioural equivalence that is based past and future bisimulations com- 
puted on finite approximations of The extrapolation to T*'" of 

these finite bisimulations capitalizes on the structure of the states of 
T^", which are strings of states of T. We show how this extrapola- 
tion may be rephrased as a problem of detecting confluence properties 
of rewrite systems that represent the bisimulations. Thus, we can draw 
upon techniques that have been developed in the area of rewriting. 

A prototype implementation has been successfully applied to various 
examples. 



1 Introduction 

Finite-state automata are omnipresent in computer science, providing a powerful 
tool for representing and reasoning about certain infinite phenomena. They are 
commonly used to capture dynamic behaviours, in which case an automaton’s 
nodes model the states, and its edges the possible state transitions of a system. 
More recently, finite-state automata have also been applied to reason about 
infinite-state systems, in which case a single automaton is used to represent an 

* This work has been supported by the Esprit-LTR project Vires. 
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infinite set of system states. In regular model-checking pi1 411 II . regular sets 
of states of the system to be verified are represented by finite-state automata. 
For instance, consider a parameterized network of finite-state processes with the 
states of the processes modeled by the symbols of a finite alphabet. Then for 
every value of the parameter, i.e., for every fixed size of the network, a global 
configuration is represented by a word over the alphabet. A set of similar configu- 
rations corresponding to different values of the parameter, and hence to different 
network sizes, can then be modelled by a regular set. Or, in a system with data 
structures like unbounded message buffers, infinitely many buffer contents may 
be represented by an automaton. To reason about the dynamic behaviour of such 
a system, its transition relation is lifted to operate on such symbolically repre- 
sented sets of states. A natural choice to represent the lifted transition relation 
are finite-state transducers. 

Taking finite-state automata and transducers to describe infinite sets of states 
and their operational evolution is, in general, not sufficient when doing state ex- 
ploration. To capture all reachable states, one needs to characterize the effect 
of applying a transducer T an arbitrary number of times. In other words, one 
needs to compute the transitive closure of T. In general, this closure is not finite- 
state anymore. Nonetheless, for length-preserving transducers, partial algorithms 
have been developed that, if they terminate, produce the closure in the form of 
a finite-state transducer PES|. These algorithms can be explained in terms of 
the, in generalinfinite-state, transducer union of all finite 

compositions of T. Conceptually, they attempt to construct a finite quotient 
of by identifying states that are equivalent in some way. For example, in 
Eng, the underlying equivalence relation is induced by determinizing on-the- 
fly and then minimizing 'T^'^ . General transducers are not determinizable, but 
that paper considers length-preserving transducers, which are essentially stan- 
dard automata over pairs of symbols and can be determinized accordingly. The 
minimal automation is then approximated using a technique called saturation 
to approximate the minimal automaton. 

In this paper, we employ a different quotient construction, resulting in an 
algorithm whose application is not a-priori limited to length-preserving trans- 



ducers. It works by computing successively the approximants T- = T”* 

for n = 0,1,2,..., while attempting to accelerate the arrival at a fixpoint by 
collapsing states. This quotienting is based on a novel behavioural equivalence 
defined in terms of past and future bisimulations. The largest such equivalence, 
being a relation over the infinite-state transducer may not be effectively 

computable. To solve this problem, we first identify sufficient conditions on an 
approximant T-^ for its states (which are also states of to be equivalent as 
states of . Then we show that the equivalence of two states of T-" induces 
the equivalence of infinitely many states of . 

We illustrate the underlying intuition on a small example in which sets of 
unbounded natural numbers are represented as automata over the symbols 0 and 
succ. The transitions we consider are given by the function a, defined inductively 
by a(0) = even and a(succ(x)) = ~'(a(x)). It computes the parity even or odd of 
a number; ^ is a function that toggles parities. Consider the transition relation 
— s- that corresponds to a single step in the evaluation of this recursive definition. 
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Fig. Ha) gives a transducer, Ta, that represents this transition relation. The slash 




Fig. 1. Left (a): The Transducer Ta. Right (b): Its Product T^. 



(/) is used to separate the input symbol from the output symbols. Note that by 
the self-loop on state 0, the transducer leaves any leading occurrences of the 
symbol ^ unchanged, and similarly for the trailing occurrences of succ before 
the final 0. 

To start approximating T^'^ , consider the product transducer T^ shown in 
Fig.Hb): It moves the symbol a over one more occurrence of succ, while turn- 
ing it into a as reflected by the edge from state 01 to 12 (e denotes the 
empty string). In every next product transducer 7)^,7)^, . . ., an additional such 
succ /^-edge will appear. Clearly, the limit transducer T^^ , the union of all 
approximants, is going to have infinitely many states. On the other hand, the 
combined effect of the ever-growing sequence of succ /^-edges would be captured 
by a simple loop if states 01 and 12 were identified. Collapsing 7)^“ in this way, 
we can hope for a finite quotient. To do so, we need to address the following ques- 
tions: First, how can we justify equating pairs of states like 01 and 12 (they are 
obviously semantically different in that the realize different transductions), i.e., 
what is the equivalence notion on 7)^“ employed for quotienting, and secondly, 
how to compute the quotient without prior calculation of the infinite T^^l 

As for the first point, we must require that identifying states in the quotient 
does not introduce transductions not already present in T^^ . Equating 01 with 
12 in the above example, consider the run through the “collapsed” transducer 
that goes from 00 to 01 (or rather to the new state obtained by collapsing 01 and 
12) and then continues from this state as if continuing from 12. Exploiting the 
equation 01 = 12, this run is introduced by the collapse. Even if the states 01 
and 12 are semantically different, as observed above, identifying them does not 
change the overall semantics of T^‘^, as there exists another state that “glues” 
together the past of 01 and the future of 12, namely state 1 of Ta - Another 
class of artificial runs are those that go from 00 to 12 and then continue as if 
continuing from 01. But also in this case, there is a state in T^^ that glues (this 
time) the past of 12 to the future of 01, although it has not been constructed 
when considering 7(,-^. This state is 012 and would enter the scene as part of T^, 
when constructing the next approximant. We formalize these ideas as follows: 
States qi and <72 may be identified if there exists a past bisimulation P and a 
future bisimulation F such that the pair (gi, 52 ) is both in the composed relation 
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P]F and in F\P, thus ensuring the existence of both “gluing” states. Indeed, 
we will require that the bisimulations swap, i.e., F; P=P; F. So it will be enough 
to show that ( 91 , 92 ) is in either one of the composed relations. 

The second question is how to know that two states in some approximant 
7(,-" are equivalent in the above sense, i.e., how do we know that there exists a 
state somewhere in that is past-bisimilar to one and future-bisimilar to the 
other? For this we exploit the structure of states, namely that they are 

sequences of states from Ta- It is easily seen that bisimulations are congruences 
under juxtaposition of such sequences. In the example above, this means that 
we can conclude the existence of an appropriate state without actually having to 
construct T^. Namely, by looking at T-^ only, we see that 1 and 12 are future 
bisimilar, whence by congruence also 01 and 012. Similarly, past bisimilarity of 
12 and 012 can be inferred by comparing 1 and 01. So we know that 012 is 
our candidate, without ever having constructed it in any approximant so far. In 
short, exploiting the congruence property allows to extrapolate the quotienting 
relation found on a finite to the whole and thus to obtain a finite 

quotient of , without calculating the limit first. 

The remainder of the paper is organized as follows. After introducing nota- 
tion and the relevant preliminary definitions in the next section. Section 0 will 
formalize the criterion for a sound quotient. An algorithm based on this and 
profiting from results of rewriting theory forms the topic of Section 0 where 
we will also report on the results obtained from our prototype implementation. 
Section El concludes and discusses related and future work. 



2 Preliminaries 

A transducer F = {Q, Qi,Qf,S, R) consists of a set Q of states, sets Qi,Qf C Q 
of initial resp. final states, a set E of symbols, and a set R of rules. A rule has the 
form qa -^wq' with q,q' G Q, a G E U {e}, and w € E*, specifying that when 
in state 9 and reading input symbol a G E (or reading no input in case a = e), 
the transducer produces output w and assumes 9 ' as its new state. A finite-state 
transducer is also called regular. The operation of a transducer is captured by the 
reduction relation n on strings consisting of symbols and a state (where e has 
its usual meaning as neutral element of concatenation), defined as follows: For 
t\,t 2 G E* , t\qat 2 fitiwq't 2 iff qa -^wq' G R. For this and other arrows we 
use common notations like — for inverse, — s-* for reflexive-transitive closure, 
and for symmetric closure. T’s semantics [7] : E* 2^^ '> is defined as 
follows: t 2 G \F\{ti) iff there exist qi G Qi and qf G Qf such that qit\-^*pt 2 qf ■ 
We will use the notation —^ 7 - synonymously for the rewrite relation — 

Transducers F\ and F 2 over the same symbol set can be composed into F 20 F 1 , 
so that the output of F\ is input to F 2 - This is a standard product construction 
where the rules R of the composition are defined by 

qja^R^vq'j qiV ^R^wq[ 

Qij (F ^ 



wqij G R 
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where R\ and i?2 are the rules of the two constituent transducers, and where we 
write Qij as short-hand for the tuple {qi, qj). Note that multiple steps of T2 may be 
needed for qi to “move through” v (or none, if u = e). This construction captures 
the semantical composition, i.e., [T^] o [ 7 i] = [72 o 7 i]. The n-fold composition of 
a transducer T with itself is written asT", with T° being defined such that it 
realizes the neutral element wrt. transduction composition, i.e., [T*’] = Ids* - By 
the same token, we will use Q" as the set of states of T", when Q is the set of 
states of T. Finally we will need the union of transducers: given two transducers 
7 i and T2 over the same signature, 7 i U 72 denotes the transducer over the same 
signature, given by the union of states, of initial states, of final states, and of 
rules, respectively, where we assume that the sets of states are disjoint. Union 
can be easily extended to the union of countably many transducers. Note that 
finite union preserves finiteness. 

To obtain a finite-state transducer out of an a-priori infinite T^‘^, we will 
have to identify “equivalent” states. The notion of equivalence used to this end 
will be based on bisimulation equivalences mdi on states. Besides the standard 
future bisimulation we need the past variant as well. 

Definition 1 (Bisimulation). Let T = {Q,Qi,Q f, S, R) be a transducer. An 
equivalence relation F C Q x Q is a future bisimulation if for all pairs ((71,92) 
of states, qi F (72 implies: 

If qi &Qf, thenq2&Qf, and for every a,w,q[ such that qia ^~.j-wq[, there 
exists q'2 such that (72a P l2- 

An equivalence relation P C Q xQ is a past bisimulation, if for all pairs ((71,9^ 
of states, q[ P q'2 implies: 

7 / 9 i G Qi, then q'2 G Qi, and for every a,w,q\ such that qia^^-j-wq'i, there 
exists 92 such that 92a ~^.j-wq2 and 91 P 92. 

We call 9i and 92 (future) bisimilar, written 91 92, if there exists a future 

bisimulation F with 91 F 92; and 91 92 denotes two past bisimilar states, 

defined analogously. 

The bisimulation relations enjoy the expected properties (uni) : For both the 
future and the past case, the identity relation is a bisimulation, the inverse 
of a bisimulation is one, as well, and the notion of bisimulation is closed under 
relational composition. Furthermore, the notions of bisimulation are closed under 
union, more precisely, given two future bisimulation relations Fi and F2, then 
(Pi U F2)* is a future bisimulation, and analogously for the past case. The 
extra Kleene closure is needed since we require a bisimulation relation to be an 
equivalence. It is standard to show that future bisimilarity implies semantical 
equality, i.e., Tj 72 implies [Tj] = [72], and that the two relations ~p and 
are congruences on Q*, the free monoid over of T’s set of states Q. We will 
exploit this property in Section 0 

The definition of a quotient is fairly standard: 

Definition 2 (Quotient). LetT = {Q,Qi,Q f, E, R) be a transducer and = C 
Q X Q an equivalence relation. is defined as the transducer {Q {[ 9 ]si | 9 G 
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Qi}){[ 9 ]= I? S where Q is the set of =- equivalence classes of Q 

and [g]ai the =-equivalence class of q. The rules ofT/^ are given by qa^^wq' G 
R/^ iff there exist q,q' such that q = [g]^, q' = [q'^, and q'a -^wq' G R. 



3 Sound Quotienting of 



Next we formalize the equivalence relation used to quotient and show the 
correctness of the construction. As illustrated on the example of Section CJ the 
key intuition behind a sound quotient is that, whenever identifying states gi and 
q2, there must exist a state realizing gi’s future and 52 ’s past, and a state realizing 
gi’s past and q2’s future. “Having the same future (past)” will be captured by 
being future (past) bisimilar. To ensure the existence of both required states, we 
will restrict our attention to swapping future and past bisimulations: 

Definition 3 (Swapping). Two relations R and S over the same set swap (or: 
are swapping^, if R]S = S;R (where ; denotes relational composition). 



We are now ready to formulate the section’s central result, which allows to 
collapse the infinite to a (possibly) finite transducer without changing its 
semantics. Note that the theorem covers collapsing with respect to or 
with respect to as special cases, since the identity relation on Q* is a past 
as well as a future bisimulation and moreover, as neutral element of relational 
composition, swaps with every relation. The full proof appears in 



Theorem 4. Let R be a transducer, and F and P a swapping pair of a future 
and past bisimulation on . Then the quotient of R<^ under F] P is 

well-defined and preserves the transduction relation, i.e., 



Proof Sketch. With A; P being a congruence, we will write =f-,p for that relation 
in the rest of the proof. 

The important direction to show is that 
“ K^) implies t' G (the reverse 

implication is straightforward: Collapsing states 
never yields fewer transductions). To show this 
implication requires a characterization of the re- 
ductions realized by a quotient: Since for any 
congruence relation =, is given by identi- 
fying states of R<^ while retaining the reduc- 
tion relation of R<‘^ (modulo the collapsing of 
states), the possible reduction steps of are either reduction steps from 

or steps replacing a word by a congruent one, i.e., [fy]s iff 

/- 

ti U =)* t2. Using this characterization for the congruence =f;P, the 

above implication can be phrased as (and generalized, for sake of induction, to) 
the following requirement: 




If U =F;p)* fy, then there exist words t[ and t'2 such that 

t[^’)-^^t2, and furthermore fy =f-p t) and t2 =f-,p t'2. 
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This property is shown by induction on the length of the reduction sequence. 
Distinguishing in the induction step ti =p-p and ti both cases 

are solved by straightforward induction, where the second one (cf. the above 
diagram) exploits the assumption that =f;P is swapping. 

To see that the result follows from the above implication, use the soundness 
observation for the unquotiented transducer, that t' G iff t' G 

for some k G to, and specialize ti resp. ^2 to gdi resp. to where ti,t 2 G E*, 

and furthermore qi G Qi and qf G Qf. □ 



4 An On-the-Fly Algorithm for Quotienting 

To make algorithmic use of the quotienting result, we must be able to effectively 
compute (and represent) swapping bisimulation relations on In this section, 
we show how to obtain these by extrapolating from information established on a 
finite approximant T-”, and exploiting the structure of U T(T°) U 

T(T(T°)) U .... To apply Theorem 0 we must extrapolate two properties: 1) 
the (future or past) bisimulation requirement, and 2) the property of swapping. 
In order to do the extrapolation, we will view the relations F and P on 
as rewriting systems on Q*, indeed a restricted form of ground (i.e., without 
variables) rewriting systems on strings. We will draw upon various standard 
notions and results from rewrite theory, briefly recalling them as they occur. A 
detailed treatment of the held can be found in e.g. 0. The basic notions can 
be illustrated on our running example. The fact that states 1 and 12 are future 
bisimilar is rephrased by assuming two rewrite rules, one saying that 1 may be 
rewritten into 12, and another saying that 12 may be rewritten into 1. We use 
to denote the rewrite relation generated by F, i.e., for a, a' G Q*, we have 
if q: = aifdur, a' = aifd'ar, and {13,13') G F for some ai,ar, (3, (3' G Q*; 
similarly for ~^p. The relations ■^^p and -^p denote the congruence closure^ 
of F and P over the monoid Q* of strings over Q. 

We first address question 1) from above. As mentioned in Section 12 the 
future and past bisimilarity relations are congruences over the monoid Q* , i.e., 
if a ~/ a' and (3 ~/ /?', then a(3 a' f3' , for all a,a',f3,f3' G Q* , and similarly 
for ~p. Based on the congruence property, the following lemma expresses the 
required extrapolation of bisimulation relations from a finite approximant to 

'J'Kuj 

Lemma 5. Let F he a finite-state transducer with states Q and, for some n > 0, 
let F and P C x be future and a past bisimulation on T-”. Then the 
relation ^^*p , resp. is a future, resp. a past, bisimulation onT^'^. 

After having extended finite bisimulations F and P, question 2) is whether 
-^p and -^p additionally enjoy the swapping requirement. Now, reducing 

^ As F and P are symmetric, taking the symmetric closure has no effect, but we still 
prefer to write and -^p instead of — and — in order to stress that they 
are symmetric. 
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properties of a many-step rewrite relation to properties of the one-step rela- 
tion is a standard topic in rewrite theory. First note that swapping of rela- 
tions is closely related to the notion of commutation: R and S commute if 
C S*-, {R~^)* (note the transitive closures). Now for symmetric re- 
lations, clearly R and S commute iff R* and S* swap. The following lemma 
(see e.g. [ 2 |) reduces commutation to the commuting- diamond property: R and 
S have the commuting-diamond property if R~^; S C S] R~^. 

Lemma 6. Let F and P he two relations on Q-^ x Q-^. If-^F and-^p have 
the commuting-diamond propert^ then they commute. 

To effectively identify cases where the (infinite) relations f and p have the 
commuting-diamond property, one can restrict attention to the so-called critical 
pairs. Consider rewrite rules {aF,fdF) and {ap,(3p) from F and P respectively, 
such that ap overlaps with ap in the following way: either 71 of = o:p 72 with 
I 71 I < |ap|, or Of = 7 io;p 72 , for some 71,72 G Q*. Then the corresponding crit- 
ical pair is defined as (7 i/3f, /?p 72 ) in the first case and (/?f, 7 i/?p 72 ) in the sec- 
ond. Now, in order to check whether -^f and p have the commuting-diamond 
property, it suffices to check, for eve^ such critical pair (Sf,Sp), whether there 
exists S such that Sp P S, and Sp F (50. As the rewrite systems F and P are finite, 
there are also only finitely many critical pairs to check. Note that this technique 
offers only a sufficient condition for the commuting-diamond property. 

Lemma El and Lemma El together allow now to apply the quotienting Theo- 
rem 21 and do the desired extrapolation. 

Corollary 7 (Soundness). Let F he a transducer with states from Q and, 
for some n, let F C Q-'^ x and P C x a future resp. a past 

hisimulation on T-". If^^F and^^p have the commuting-diamond property, 
then [T<“] = [T/^. ]. 

To make notation a little less heavy-weight, we will for the rest use = to abbre- 
viate the congruence relation -^*p',~^*p. 

Let us illustrate the ideas so far on the transducer from Fig. E On the 
approximant (i.e. the unions of the transducers in parts (a) and (b) of 

Fig. ID, one pair of a future and a past bisimulation (represented as rewrit- 
ing systems) is F = {(12, 1), (1, 12), (22, 2), (2, 22)} U /d{o,i,..., 22 } and P = 
1(00, 0), (0, 00), (01, 1), (1, 01)} U 1 22 }) where Lds denotes the “identity 

rewrite system” on S. Indeed, these bisimulations are the largest choices. It can 
be easily checked that the corresponding rewrite relations f and p have 
the commuting-diamond property. For example, the overlapping pair consisting 
of state 1 from the rule (1, 12) of F and state 1 from the rule (1, 01) of P opens 
a diamond that may be closed again by rewriting both 12 and 01 to 012 (using 
the same rules) . Now, without actually attempting to fully compute the relation 

^ In fact, a weaker property suffices, called strong commutation: R~^\S C (S' U 
Id)-{R-^)*. 

® There is a similar condition in case strong commutation is used. 
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input T = {Q,Qq, S, R) 

X := Tid\ 

repeat 

Y := {ToX)uTu-, 

determine bisimulations F and P on X s.t. 

F and p swap and each possess the diamond property; 
until Xj= (T o Xj=) U %d 



Fig. 3. Calculating T* 



=, we can already detect several equiva- 
lences between states. Most importantly, 
the states 1, 01, and 12 belong to the 
same equivalence class. Furthermore, we 
have 00 = 0 and 22 = 2. Quotienting 
by this equivalence gives the trans- 
ducer of Fig. 13 where only the relevant 
part is shown. It can be checked that the 
construction stabilizes at this point, so we 
have arrived at T*. Note that quotienting 
7^“^“ using or in isolation does not 
give a finite quotient. 

The algorithm based on these ideas is sketched in pseudo-code in Fig. 0 
Given a transducer T = (Q,Qq, S, R), the until-loop iteratively calculates, in 
variable X, the approximations T-”. On each approximation, bisimulations F 
and P are computed by a partition refinement algorithm [T7I5I . Note that in 
the termination condition, the approximant transducer X is quotiented using 
the whole equivalence = = and not just by those identifications that 

happen to be directly detectable on X, as suggested in the example above. The 
ability to do so relies again on techniques from rewrite theory. First, it can 
be shown that = {^^pU -^p)* = ^^p^Jp■ So, the question is when 

strings are congruent under the rewrite system F U P. The first answer of rewrite 
theory is: If this system is confluent, i.e., commutes with itself, and terminating, 
i.e., allows no infinite sequences of rewrite steps, then strings are congruent 
iff they rewrite to the same normal forms. This obviously gives a procedure to 
determine congruence. Being a special case of commutation, confluence oi F \J P 
can be checked using Lemma 0 by inspecting critical pairs. In practice, we can 
avoid duplicating work by the following standard result. 

Lemma 8. If p and p commute, then p U p is confluent if each of 
-<^p and ^^p in separation is confluent. 

So, if commutation of p and p has already been checked when deter- 
mining whether and -^p swap, then it suffices to check confluence of the 
individual relations. In case -<^p U -^p turns out to be not confluent, still not 
all hope is lost. The next, more advanced technique offered by rewrite theory 
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is to try to turn the rewrite system F U P into an equivalent rewrite system 
that is confluent, using so-called Knuth-Bendix completion (ISl; we refer to ^ 
for details. 

As for checking termination — it is clear that the relations F and P in sepa- 
ration are already non-terminating, as they are reflexive and symmetric. But also 
in this case, there is the possibility of turning F U P into an equivalent system 
that does terminate. Because of the very simple form of this rewriting system 
— ground rewriting on strings — it is easy to capture by a terminating 

one: Just order pairs lexicographically and remove the “reflexive” part Id,Q<n . In 
our example, the quotienting relation = can in this way be represented by the 
four rules {(00, 0), (01, 1), (21, 1), (22, 2)}, where the right-hand side of each rule 
is strictly smaller than the corresponding left-hand side in lexicographic order. 

A few points concerning the implementation deserve mention. For once, the 
naive iteration as sketched in the pseudo-code can be optimized in a number of 
ways, especially by reusing information collected from the lower approximants 
when treating For instance, in case one knows already that (00,0) are 

past bisimilar after investigating the first two levels, as in our example, there is 
no need to check (000, 00) for past-bisimilarity at the third (if at all it would be 
needed to construct that level). Another, more tricky point is that the search 
for bisimulations F and P under the additional requirements of swapping and 
confluence, adds an element of non- determinism to the process. Namely, it may 
be that bisimulations as they are found do not swap or are not confluent, but that 
smaller bisimulations would in fact satisfy these requirements. In such a case we 
would have to choose which pairs of states to delete. However, in the examples we 
tested, the largest bisimulations and ^p, as given by the partition refinement, 
always worked. 

We tested our implementation on various examples, for instance the one of 
Fig. m or the token array example of m- In all but one case, the transitive 
closure was computed in a short time on a standard desktop workstation. In the 
remaining case, a ring configuration of the token array, the computation took 
too long. We expect that by implementing some additional optimizations (see 
below), this and other, larger transducers can be successfully handled. 

5 Conclusions, Related Work, and Future Work 

We presented a partial algorithm for computing the transitive closure of regular 
word transducers. This algorithm allows to reason about the effect of iterating 
transduction relations an unbounded number of times. Such relations are used, 
for instance, in regular model checking where they represent the transition re- 
lation of an infinite-state system. Given a transducer T, our algorithm is based 
on quotienting, w.r.t. the composition of a future and a past bisimulation, the 
possibly infinite-state transducer the union of all finite compositions of T. 

To be able to develop our algorithm, we presented sufficient conditions that al- 
low to exploit bisimulations discovered on a finite approximant T-", and hence, 
to avoid constructing . Though our prototype implementation can be im- 
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proved in several ways, we obtained encouraging results on the examples we have 
considered. 

In order to compute T*{S) for a given regular set S, our results specialize to 
automata, allowing to accelerate the computation of T-^{S), T-^(S'), T-^(S'), 
.... This problem, where the set of initial configurations is also a parameter of 
the algorithm, can be solved in more cases than the general cas^. 

Closest to our work is pitl I . which presents an algorithm using standard 
subset-construction and minimization techniques from automata theory. Suffi- 
cient conditions for termination of the algorithm are identified. Roughly speak- 
ing, our algorithm and the one from start from opposite extremes. Our 

algorithm starts from T and tries to compute a finite quotient of . Their 
algorithm starts from the initial state of which can be represented by the 
regular language and tries to compute the states of performing a for- 
ward symbolic reachability analysis (this is the determinization) while relaxing 
the condition stating when a state has already been visited. This relaxation 
(called saturation in their work) assumes a fixed set of equivalences between 
states of . On the contrary, our algorithm tries to discover such equivalences 
dynamically, i.e., during execution. Now, an important assumption in their ap- 
proach is that the set of pairs (a, w) G S x S* that occur along the edges of 
is finite and known in advance (or at least a finite super-set must satisfy 
these conditions). In case T is a “letter-to- letter” transducer, only pairs from 
S X E may occur in and hence, the assumption is satisfied. However, for 

non-length-preserving transducers the assumption is in general not satisfied. 

Besides the improvements mentioned in Section 0 and implementation im- 
provements like using BDDs to represent transducers, we believe that there are 
variations of our algorithm that are worth studying. One such variation con- 
sists in computing at each iteration of the algorithm the composition of T with 
the quotiented transducer obtained upto that iteration. This would reduce the 
number of states of the transducers that occur as intermediate results of the 
algorithm. A similar idea underlies what is called compositional model-checking, 
e.g. nni. The difficulty in our context lies in the generalization of the computed 
bisimulations to 

We are currently extending our results to the case of tree transducers. Here, 
in the general case, one is confronted with negative results from tree transducer 
theory, the main one being that regular tree transducers are not closed under 
composition. To avoid this problem, we restrict ourselves to linear tree trans- 
ducers. A preliminary account, which also provides the full proofs for the word 
case, can be found in |5j. 
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Abstract. We propose a new symbolic model checking algorithm for 
parameterized concurrent systems modeled as (Lossy) Petri Nets, and 
(Lossy) Vector Addition Systems, based on the following ingredients: a 
rich assertional language based on the graph-based symbolic represen- 
tation of upward-closed sets introduced in [DBiOPj . the combination of 
the backward reachability algorithm of |AG)T0(i| lifted to the symbolic 
setting with a new heuristic rule based on structural properties of Petri 
Nets. We evaluate the method on several P etri Nets and parameterized 
systems taken from the literature |ABG+95lKMP()IFmO:)IMCPi)| . and we 
compare the results with other finite and infinite-state verification tools. 



1 Introduction 

The theory of well-structured systems jAG.TTflBIFSflllJ gives us decision pro- 
cedures to verify safety properties of parameterized systems modeled as Petri 
Nets |AC,TT96IFSQ1| , Lossy Vector Addition Systems [BM99j . and Broadcast 
Protocols | IFFM99| . The decision procedures are based on backward reachability 
algorithms like the one proposed in lAG.mibi , whose termination (for Petri Nets 
and their extensions) is guaranteed by Dickson’s lemma. It is important to recall 
that forward approaches like Karp-Miller’s coverability tree are not robust when 
applied to extensions of Petri Nets like Broadcast protocols [KFMflflj l . 

Differently from the finite-state case, in parameterized verification symbolic 
representations are ineluctable in order to make the approach effective: we need 
to finitely represent infinite collections of states. In the backward approach of 
fAG,TT96IFS01j we need to represent infinite, upward-closed sets of markings, 
when we restrict our attention to Petri Nets. Two examples of symbolic repre- 
sentations for upward-closed sets of marking are collections of minimal points 
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and linear arithmetic constraints The complexity of the al- 

gorithm of IIAOJT56I is non-elementary. For this reason, naive implementations 
of the backward approach suffer from the symbolic state explosion problem: the 
number of minimal points or the size of the constraints become unmanageable 
after few iterations. Symbolic state explosion is the counterpart of the state 
explosion problem we known from finite-state verification. 

In our previous work [QliLi Lil we proposed a new rich assertional language, in 
the terminology of [KMM+fl'^ . for representing compactly upward-closed sets of 
markings. Our data structure, we will call here Covering Sharing Trees (CSTs), 
are directed graphs in which we store the minimal points of an upward-closed set 
as a collection of tuples, and for which we allow the maximal sharing of prefixes 
and suffixes. To obtain efficient operations, it is crucial to avoid enumerating 
the paths of a CST. Working on the graph structure of CSTs, we defined all 
operations needed for lifting the backward reachability algorithm of !AC,TT96| 
to the symbolic level. In the preliminary results given in [Db.OOj , we managed to 
prove properties of Petri Nets (of small size) that could not be managed from 
other infinite-state model checkers (working backwards) like HyTech pTHW97j . 

Following our line of research, the conceptual contribution of this paper is a 
new heuristic rule for attacking symbolic state explosion based on the combi- 
nation of CSTs and well known techniques for the static analysis of Petri Nets. 
More precisely, the heuristic rule is based on structural properties IST<M of 
Petri Nets, i.e., on a fully automatic static analysis, whose results can be used 
during the backward reachability algorithm to significantly cut the search space. 
As the other techniques presented in 0 JRIIOj . our structural heuristic works in 
polynomial time on the graph structure of CSTs. When combined with our CSTs- 
symbolic representation, the heuristic rule allow us to scale up the dimension of 
the case-studies of one order of magnitude. 

As practical contribution, we describe a set of benchmarks we obtained with 
an optimized implementation of the CST-library, integrated with the above men- 
tioned structural heuristic. We have applied the resulting model checking algo- 
rithm to a large set of examples of parameterized verification problems that 
can be solved using decision procedures for coverability of Petri Nets (e.g. mu- 
tual exclusion for the parametric models like the Mesh and Multipoll examples 
of IAHC-^9,^M(T??^ . and semi- liveness for the PNCSA protocol of |Fin98| 'l. We 
have also applied our method to verify safety properties of finite-state systems 
(e.g. some of the above mentioned examples for fixed values of the parameter). 
For these examples, we have compared our results with the results obtained with 
the specialized tool GreatSPN ICFCK951 for computing the reachability set of 
Petri Nets. As foreseen by Bultan in [^BiilQQj . in most of the cases proving a pa- 
rameterized property turns out to be more efficient than proving its finite-state 
instances. 

Before entering in more details, in Section 2 we will briefly recall the main 
ideas behind the connection between parameterized systems, Petri Nets, back- 
ward reachability, and in Section 3 the basics of CSTs. The new heuristic rule is 
presented in Section 4. The new symbolic algorithm is presented in Section 5; 
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Fig. 1. An example of Petri Net with parametric initial marking {K, 1, 1, 0, 0), 
K >1. 



its practical evaluation is presented in Section 6. We finish the paper discussing 
related works and drawing some conclusions. 



2 Petri Nets and Verification of Safety Properties 



Following asynchronous concurrent systems (possibly with internal states 

modeled via Boolean variables pCRblj l can be naturally represented as Petri 
Nets in which places and transitions are used to model local states, internal ac- 
tions and communication via rendez-vous. At this level of abstraction, processes 
can be viewed as undistinguishable black tokens. A marking m = (mi, . . . , mn), 
a mapping from places to non-negative integers, can be viewed as an abstraction 
of a global system state in which we only keep track of the number of processes in 
every state. The number of processes in the system is determined by the initial 
marking mo. The backward reachability approach for verification of safety prop- 
erties of Petri Nets is based on the following notions, taken from jA( flTfiUFSOIIJ . 
Given m — (mi , . . . , m„) and m' — (m [, . . . , m(j), we say that m ^ m' (m' is 
subsumed by m) if and only if for i : 1, . . . ,n. A set of markings U is 

upward-closed if for any m G U and any m' such that m ^ m', we have that 
m' G U. Any upward-closed sets in N™ can be finitely represented by its finite 
set of minimal points, we will call gen(U). 

The relation ^ is a well-quasi ordering. This property ensures the termination 
of backward reachability, whenever the starting point of the exploration is an 
upward-closed set of markings. As an example, consider the Petri Net of Fig.QJ a 
monitor for a parameterized system with two mutually exclusive critical sections 
(csi and CS 2 ). Initially, all K processes are in p\. To enter csi, a process tests 
for the presence of processes in cs 2 using p 2 , and locks csi using p^ (transition 
ti), and vice versa. Processes leave the critical section using transitions t^ and 
ti. Note that the set U of violations to mutual exclusion is the upward-closed set 
generated by the minimal violations (0,0, 0,2,0), (0,0, 0,0, 2), and (0,0,0, 1,1) 
(at least 2 tokens in pi-\-p 5 ). To prove that the protocol guarantees mutual exclu- 
sion for any value of K, it is enough to show that no admissible initial marking 
is in the set of predecessor markings Pre*(G) of U (Pre is the operator that re- 
turns the set of markings that reach some marking in U by firing a transition) . 
To compute Pre*(C/), we iterate the application of the predecessor operator Pre 
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Iteration 2 



Iteration 3 



( 0 , 0 , 0 , 1 , 1 ) 

f *1 

( 1 , 1 , 1 , 0 , 1 ) 



( 0 , 0 , 0 , 2 , 0 ) 



( 0 , 0 , 0 , 0 , 2 ) 



( 1 , 1 , 1 , 1 , 0 ) 

Violate Inv * Violate Inv 

t2 ti 



( 2 , 2 , 1 , 0 , 0 ) 



( 2 , 1 , 2 , 0 , 0 ) 



*3 

( 1 , 2 , 0 , 1 , 0 ) 



t4 
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Fig. 2. Backward Reachability Graph. 



until we reach a fixpoint. During the computation, every newly generated mark- 
ing is stored only if it is not subsumed by an already visited one. The backward 
reachability graph of our example is given in Fig. El (ignore the annotations for 
the moment). In Fig. 0we have omitted all redundant markings (about 30). As 
mentioned in the introduction, the symbolic backward approach based on the 
enumeration of minimal points of sets of markings suffers from the symbolic state 
explosion problem. More sophisticated data structures are necessary to make the 
approach feasible in practice. 



3 The Assertional Language: Covering Sharing Trees 

In jPHOOj . we studied the mathematical foundations of Covering Sharing Trees 
(CSTs), a new data structure to symbolically manipulate upward-closed sets. 
CSTs are based on the Sharing Trees of A /c-sharing tree S is a rooted 

acyclic graph with nodes partitioned in k-layers (apart from the special root and 
end nodes) N = { root} UfViU. . .LlNkLl{end}, successor relation succ : N 2^ , 
and labeling function val : N ^ ZU{T,T}, such that: (1) all nodes of layer i 
have successors in the layer i-l-1; (2) a node cannot have two successors with the 
same label; (3) two nodes with the same label in the same layer do not have the 
same set of successors. The flat denotation of a sharing tree is defined as follows 

efem(S) = {(rat(ni), . . . , val{nk)) \ (T, ni, . . . , n^, T) is a path of S|. 

Conditions (2) and (3) ensure the maximal sharing of prefixes and suffixes among 
the tuples of the flat denotation of a sharing tree. The size of a sharing tree 
is the number of nodes and edges. The number of tuples in elem{S) can be 
exponentially larger than the size of S. As shown in |ZL^ . given a set of tuples 
A of size k, there exists a unique (modulo isomorphisms of graphs) sharing 
tree such that efem(S^) = A. A CST is a sharing tree obtained by lifting the 
denotation of a sharing tree from the flat one of jZL94IJ to the following rieh one 

cones(S) = {m \ n =4 m, n S eZem(S)}. 

Given an upward closed set of markings U, we define the CST Su as the k- 
sharing tree such that elem{Su) = gen{U). Thus, S[/ can be used to compactly 
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Fig. 3. An Example of CST. 



represent gen{U), and to finitely represent U. In the best case the size of Sjj is 
logarithmic in the size of gen(U). A CST Sjj can also be viewed as a compact 
representation of the formula: V^geZem(Sc 7 ) ( 2:1 > mi A . . . A > m„). As 
an example, the CST S that symbolically represents the set of violations of 
our example is given in Fig. 01 Let us note that any S', such that gen{U) C 
efem(S') and such that all additional elements are redundant (i.e., are subsumed 
by elements in gen{U)) can still be used to represent U. We will call such a CST 
redundant. In the following we will show that it is often more efficient to work 
with redundant CSTs. In [DROOj . we have defined the operations needed to 
implement a CST-based backward reachability procedure. The operations work 
on the graph structure of CSTs. In the following we will use UnioncsTi^, T) to 
indicate the CST whose denotation is cones(S) U cones(T), and PrecsT(S, t) to 
indicate the CST whose denotation is cones(Pre(cones(S), t)) for some transition 
t. Checking subsumption between CSTs, namely whether cones(S) C cones(T) 
holds, the complexity of this test is CO-NP hard (event if the two CSTs are 
not redundant). In jDRI)()j . we have defined a set of polynomial time sufficient 
conditions (with different precision) to check subsumption for CSTs, based on 
simulation relations between nodes of the corresponding sharing trees. Formally, 
a node n in the i-ih. layer of S is forward-simulated by node m in the i-th 
layer of T if and only if val(n) > val{m) and for every successor node n' of 
n there exists a successor m' of m that forward-simulates n' . If the the root 
node of S is forward simulated by the root node of T than S is subsumed T. 
Similar definitions and properties can be given for backward and mixed forward- 
backward simulations. The operations PrecsT and UnioncsT do not guarantee 
to generate CSTs that contain only the minimal points. However, removing all 
redundancies is CO-NP hard. As shown in |DH,()0j . simulation relations helped 
us again to obtain polynomial algorithms to partially eliminate redundancies. 
(As a technical remark, we point out that these techniques allow us to remove 
tuples of a given CST that are subsumed either by tuples of another CST or by 
tuples of the same CST.) Unfortunately, CST and simulation-based heuristics 
are not enough to mitigate symbolic state explosion. New heuristics for pruning 
backward search seem necessary in order to handle large examples. 

4 Structural Heuristic 

In the backward reachability approach, every place of a Petri Net is initially 
considered as unbounded (in fact, unsafe states are expressed via constraints like 
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xi > Cl, etc.). In many practical cases however, some places are bounded for 
any value of the parameters in the initial configuration. The Structural Theory of 
Petri Nets fSTCH8| can help us to distinguish between bounded and unbounded 
places. Let be a Petri Net with n places, m transitions, and token flow matrix 
C (C describes how tokens are moved in the net by the transitions; rows cor- 
responds to places, and columns to transitions). Furthermore, let • denote the 
vector product ■ b = ai6i -I- . . . a„6„, where indicates the transpose of 
vector a. Place invariants iMT< m are one of the possible informations we can 
compute via a static analysis of N. A place invariant (also called P-semiflow) is 
a vector p = (pi, . . . ,pn) (non-negative) solution of the equation 

x'’' ■ C = 0, X > 0, 

where a; is a vector of variables of dimension n. Given an initial marking mo, 
and a place invariant p, the set O{mo,p) = {m \ p^ ■ m — p^ ■ mo} over- 
approximates the reachability set of the Petri Net. This property follows from 
the definition of place invariant, and from the state equation m = mo -I- C • cr 
that characterizes a generic marking m reachable from mo via the sequence of 
transitions represented by the firing vector a (see mm)- As a consequence, 
the equation 

T T 

p ■ X — p ■ mo 

for some place invariant p gives us a structural invariant we can use to analyze 
the net. Let us consider our running example. The three following equations are 
invariants of the net in Fig. ^with the parametric initial marking (AT, 1, 1, 0 , 0 ): 
(i) X 2 + X 5 = 1, (ii) xz + xa = 1, (hi) x\ + xa + x^ = K. Unfortunately, the 
invariants are not sufficient to prove our mutual exclusion property xa + x^ < 1. 
Still the invariants contain information that we can exploit during the backward 
search. A possible way to use the structural analysis would be to make what 
is usually called program specialization, i.e., we can replace the subnet involv- 
ing places linked by structural invariants (e.g. P2,P5 for X2 + X5 = 1 ) with a 
control part (a finite-state automata). This way however, the net resulting from 
the specialization may become of unmanageable size. As an alternative, we pro- 
pose to use the structural invariants directly as heuristics for efficient backward 
reachability. 

4.1 Pruning the Backward Search Space 

Let U be an upward-closed set of markings denoting unsafe states, and let U' = 
U n O(mo,p) for some place invariant p. We first note that if U' = 0, then 
we can immediately infer that the net is safe. However, as in our example, 
invariants might not be sufficient to directly verify the property. We will use 
them to prune the backward search as follows. Let us consider again our running 
example and the backward reachability graph of fig. El After the first iteration, 
two generators ( 1 , 1 , 1 , 0 , 1 ) and ( 1 , 1 , 1 , 1 , 0 ) that are not subsumed by previous 
elements are computed. The first generator defines a set of markings that has 
no intersection with the set of markings defined by the invariant X 2 + X 5 = 1, 
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Fig. 4. The CST S denoting PrecsT(S), where S is given in Fig.El 



while the second generator defines a set of markings that have no intersection 
with the set of markings defined by the invariant 0:3 + X 4 = 1. As a consequence, 
we deduce that no markings defined by those two generators can be reached 
from an instance of the parametrized initial marking (recall that the markings 
satisfying the invariant over-approximate the set of reachable markings.) As a 
consequence, we can stop the backward search after the first iteration instead of 
having to consider 3 iterations as in the naive search. Let us now examine how 
we can incorporate this idea in our CST-based backward search. Since U' is not 
upward-closed, it cannot be used as the starting point of our symbolic backward 
search. The following theorem however gives us indications on how to proceed. 

Theorem 1. Given a Petri Net N with initial marking mo, a place invariant 
p, and an upward-closed set of markings U represented by a CST S, suppose 
cone{m) H O{mo,p) = 0 for some m S efem(S). Furthermore, let S' he the 
CST such that eZem(S') = eZem(S) \ {m}, and m'g be any instance of mo- 
Then, 

mg e Pre*(cones(S)) iff m^ G Pre* (cones (S')). 

The theorem shows that during the computation of Pre*([/) we can prune the 
search space by safely removing all elements m G elem(Su) (redundant or not) 
such that cone(m) has empty intersection with the set of markings defined by a 
structural invariant. We call such elements useless. To prune the space efficiently, 
we must avoid the explicit enumeration of all elements stored in a CST. In fact, 
the number of those elements is potentially exponential in the size of the CST. 
Instead of trying to remove all the useless elements for a give invariant p, we use 
an heuristic rule that works directly on the graph structure of the CST and does 
not enumerate its paths. To describe the heuristic rule, we need the following 
definitions. Let S be a CST , and let e = (v, w) be an edge of S connecting nodes 
of two adjacent layers. We define eleme(S) as the set of tuples from elem(S) 
denoted by paths of S passing through e. Formally, m = (val{vi), . . . , val(vn)) G 
eleme(S) iff there exists a path (T,ui, . . . ,v,w, . . . ,Un,T) in S such that e = 
(v,w). Consider now a structural invariant, say T, having the form p^ ■ x = 
p^ ■ mo, where p is a place invariant (hence, p ^ 0 ) and, such that p^ ■ mo 
is an integer, i.e., p^ ■ mo does not contain occurrences of the parameters (e.g. 
we keep X 2 + X 5 = 1 and X 3 X 4 = 1, and discharge X\ X 4 x^ = K). Our 
heuristic rule works by removing an edge e of S, whenever we can prove that 
the elements in eleme(S) denote cones that do not intersect with the structural 
invariant T. To check this condition on the edge e connecting a node of layer i 
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and a node of layer i + 1, we first compute the two values min^(e) and miny{e) 
defined as follows: min^ (e) is the minimal value of prefixes {mi , . . . , mi) of tuples 
in eleme{S) evaluated on the function x ■ x] symmetrically, miriy{e) is 

computed for suffixes (rui+i, • . . , m-n)- Specifically, we define 

min^{e) = mm { • (toi, . . . , mi, 0, . . . , 0) | (mi, . . . , m„) G eZeme(S)}, 

miriy{e) = mm { • (0, . . . , 0, mj+i, . . . , m„) | (mi, . . . , m„) G eZeme(S)}. 

The following two properties characterize our heuristic rule. 

Theorem 2. Given the initial marking mo, the CST S, the structural property 
■ X = p^ ■ mo, and the edge e of S, if min ^{e) + miny{e) > p^ ■ mo, then 
cone{m) n O{mo,p) = 0 for any m G eleme{S) 

Theorem 3. Given a GST S, an edge e, and the invariant p^ ■ x = p^ ■ mo 
such that p^ ■ mo G Z, there exists a polynomial time algorithm that computes 
the values min^{e) and miny{e). 

Based on the previous property, we can devise a procedure to heuristically cut 
the CSTs produced during the backward search. As an example of application 
of the structural heuristic, consider the CST of fig. 01 The CST S contains the 
elements obtained at iteration 1, the pairs of values on the arcs are the values 
mm^(e) and mm^(e) for the place invariant X 2 +X 5 = 1, the dashed edges can 
be removed and thus the useless element (1, 1, 1, 0, 1) is removed from the CST. 
Note that if we use the invariant = 1 then the last element can also be 

eliminated. The heuristic rule simply traverses a CST layer by layer, removing 
all edges that satisfy the hypothesis of Theorem |3 To complete the scenario, 
we need to compute automatically the structural invariants. This can be done 
using specialized libraries to compute place invariants like the one available with 
GreatSPN KT'CRfi.^l . 

5 Symbolic Backward Reachability 

The three main problems we had to solve to obtain an efficient CST-based back- 
ward reachability algorithms were: (1) avoid to generate too many redundant 
elements during the fixpoint computation; (2) use an efficient fixpoint test us- 
ing sufficient conditions for CST-subsumption; (3) remove useless elements (ele- 
ments that cannot be reached from the given initial state). As a practical solu- 
tion to those problems, we propose the algorithm of Fig. 0 The algorithm uses 
simulation-based heuristics to remove redundancies and for testing subsumption 
between CSTs, in combination with the heuristic rule proposed in the previous 
section. Let us give some more detail on the algorithm of Fig. 0 The variable S 
stores the current frontier of the breadth-first performed by the algorithm. The 
variable T stores the set of visited generators. Before entering the main loop, 
we need to test subsumption between S and T. For this purpose, the following 
heuristic seems to work well in practice. We first compute the forward and back- 
ward simulation relations between the nodes of S and the nodes of T. If the root 
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Proc Pre*csT{Su ■ CST) 

S := Sc/; T := emptycsT', 
while not{SubsumescsT{T,S)) do 

T := UnioncsT{T^,S)’, R := empty CST’, 
for each transition t do 
N := PrecsT(S, t); 
structural jreductionc ST (N) ; 
remove jredundanciesc st{^ , R, T); 
minimizecsT (N) ; 

R := UnioncsT(N, R); 

S := R; 
return T; 

Fig. 5. The CST-Based Symbolic Model Checking Algorithm. 



of S is forward simulated by the root of T or if the end node of S is backward 
simulated by the end node of T, then we know that all the generators of S are 
subsumed by some generators of T (see imrnni L thus the fixpoint is reached. If 
the test fails, we perform a depth-first, top-down visit of the CST S in order to 
compare its tuples with those of T. During the depth- first visit, we use however 
the information previously computed via the forward simulation as follows. Each 
time we reach a node n that is forward simulated by a node of T, we stop the 
exploration: all the elements in the subtree rooted at n will be subsumed by 
elements of T. In the main loop, we compute the new frontier N transition by 
transition via the symbolic operator PrecsT(S, t). In order to keep the size of N 
small, after computing Preg/sT(S, t), we first apply the new heuristic rule (via the 
function structural jreductionc st) , and then we apply simulation-based heuris- 
tics to remove redundancies. The function reTOOue_redunfianciescsT(N, R, T) 
uses simulation relations between nodes of N and nodes of R (the CST collecting 
the generators created via all transitions) and T; the function minimizecsT^^) 
uses simulation relations of nodes of N. We discuss the practical evaluation of 
the resulting algorithm in the following section. 



6 Experimental Results 

Based on a new optimized implementation of the CST-library presented in 
pDIiQflj ■ and using the library for computing minimal place invariants (a system 
of generators for the positive solutions of • C = 0) coming with GreatSPN 
EEHM, we have implemented the algorithm of Fig. 0 and tested on several 
types of verification problems expressible in terms of coverahility of markings for 
Petri Nets. The parameters taken into considerations in our evaluation are listed 
in Fig. El 

Parameterized Problems. More precisely, we have considered mutual exclusion 
properties for the parameterized, concurrent and production systems like the 
Multipoll of mm, the Mesh 2x2 of |ABC+95| (Fig. 130, p. 256), its extension 
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Size of the Petri Net 

P=Number of places; 

T = Number of transitions. 

Verification problem (VP) (only the type of property) 

ME=Mutual exclusion property; 

C=Covering for a random marking; 

SL=Semi-liveness. 

Use of heuristics 

I=Invariant-based reductions (structural heuristic rule); 

S=Simulation-based reductions. 

Statistics: execution 

EX=Execution time (in seconds) on an AMD Athlon 900 Mhz; 

NI=N. of iterations before termination (with *=before stopping the execution). 
Quality of analysis 

R=An initial state has been reached. 

Statistics: use of memory 

MaxE(N)=N. of elements (nodes) of the biggest CST associated to S of Fig. 0 
NE(N)=Number of elements (nodes) of the CST for the fixpoint; 

Ratio of memory saving (using CSTs) 

RM=MAX-N/(MAX-E x P) in pet.; 

RN=NN/(NE X P) in pet. 



Fig. 6. Parameters of the Experimental Evaluation. 



to the 3x2 case, the CSM of |AT3C+9,^ (Fig. 76, p. 154), and for an extension 
of the Readers- Writers example given in ITrI^ in which we use several buffers 
with 45 slots. Furthermore, we have considered semi-liveness and coverability 
problems for the PNCSA communication protocol analyzed in IP^'0<■^'^n03l . 
The experimental results are listed in Fig. | 7 | We performed every example ei- 
ther enabling or disabling the structural heuristic rule and the reductions based 
on simulation relations. As shown in Fig.d the heuristics turned out to be fun- 
damental to ensure termination in reasonable time for most of the examples. 
To compare our results with other infinite-state systems, we ran some of the 
parameterized examples like CSM and Mesh using the efficient model checker 
based on polyhedra (i.e. constraint solver over the reals) HyTech [HH W97j . In 
the experiments on the largest examples (using backward analysis) HyTech was 
still computing after more than one day. 

Finite-state Problems. After having fixed the value of the parameter K in the 
initial marking, we have also tested some case-studies using the specialized Petri 
Net tool GreatSPN fCFCR9,5j . GreatSPN uses efficient encodings of markings 
and simplification rules that reduce the input net to produce the reachability 
set of bounded Petri Nets. We performed our experiments on a Pentium 133Mhz 
measuring the value of K from which GreatSPN is not able to compute the entire 
reachability graph: AT = 3 for the Mesh 2x2; K = 9 for Multipoll, and AT = 115 
for CSM. In contrast, as shown in Fig.Q we managed to verify mutual exclusion 
properties for any value of AT (assuming A' > 1 in the initial marking) with the 
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following execution times: 1.26s for Mesh 2x2; 1.05s and 324s for Multipoll; and 
0.04s for GSM. As already noticed by Bultan in other case-studies |ljul00| . lifting 
a verification problem from the finite-state to the parameterized case can make 
its solution easier! Also note that the use of invariants makes the backward anal- 
ysis sensible to the initial marking. This effect is clear looking at the execution 
times obtained using different values for K for the Mesh2x2 in Fig. □ (e.g., we 
found more useful invariants for K = 1 than for K >1). 

Finally, we have also considered safety properties for non-parametric ex- 
amples (i.e., where it makes no sense to put parameters in the initial mark- 
ing) like the classical Peterson’s and Lamport’s mutual exclusion algorithms 
fIVK ;yt)fK,M()(lj . As a result, we managed to prove safety properties for all these 
examples with negligible execution times. 

7 Related Works and Conclusions 

In this paper we have presented new heuristic rule, based on the structural theory 
of Petri Nets, to be used in the backward approach of [IA( ;.ITh6IFS01 j . Efficient 
algorithms allow us to apply the heuristic rule avoiding the enumeration of the 
minimal points of upward-closed sets generated in the computation of Pre*. This 
way, we manage to mitigate the symbolic state explosion in practical examples 
we did not manage to handle with previous backward technology. With the set of 
benchmarks of Fig. [3 we hope it will be possible to establish connections with 
other recent attempts of attacking symbolic state explosion IAN0(IR1.P+^ . The 
combination of structural and enumerative techniques has been studied before 
in the context of forward reachability, where invariants are used as heuristic 
for efficient encodings of markings IGk'GRU^IPGPTllll . Structural properties are 
also used to statically compute over-approximations of the reachability set of a 
Petri Net [fWlOOpSTC !hSIJ . We are not aware of previous attempts of combining 
structural heuristics and backward reachability. 
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CST-Based Symbolic Backward Reachability: Practical Evaluation 
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Fig. 7. The experimental results have been obtained using an AMD Athlon 900 
Mhz. The parameters of the evaluation are described in Fig.lEl 
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Abstract. We present a model checking algorithm for safety properties 
that is applicable to parameterized systems and hence provides a unify- 
ing approach of model checking for parameterized systems. By analysing 
the conditions under which the proposed algorithm terminates, we ob- 
tain a characterisation of a subclass for which this problem is decidable. 
The known decidable subclasses, token rings and broadcast systems, fall 
in our subclass, while the main novel feature is that (unnested) quantih- 
cation over index variables is allowed, which means that global guards 
can be expressed. 



1 Introduction 

We present a model checking algorithm for safety properties that is applica- 
ble to parameterized systems. A parameterized system is a family of systems, 
one for each instantiation of the parameter, where an instantiation by n is the 
composition of n copies of the system, and the verification problem consists in 
checking whether all instantiations fulfil a given property. Model checking for pa- 
rameterized systems has been shown to be undecidable in general EHiEEl so the 
problem can only be approached for subclasses or by semi-algorithmic methods. 
Solutions based on different technical frameworks have been proposed. Since our 
model checking algorithm is applicable to parameterized systems in general, it 
provides a unifying method for different subclasses. 

By exploring under which restrictions the algorithm terminates, we obtain a 
characterisation of a class of parameterized systems for which model checking of 
safety properties is decidable. It turns out that the restrictions concern almost 
exclusively the way the copies communicate with each other: The admissible 
forms of communication are the very restricted synchronisation of token rings (no 
information exchanged between neighbours except “I have/have not the token”) 
and the anonymous synchronisation of broadcast protocols. Another form of 
communication, using values of variables in neighbouring copies in guards or 
assignments like in the Dining Philosophers example, has to be restricted. We 
can however allow communication by global variables and by global guards, 
(expressed by universal quantification over index variables, which run over the 
instantiated copies). The latter is used e.g. in the Bakery algorithm, and in cache 
coherence protocols with a global condition considered in lDelflfl( and extends the 
known subclasses of decidable systems. 
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As specification logic, we consider a linear time logic built over state pred- 
icates that can contain index variables. We restrict ourselves to safety proper- 
ties; liveness properties of broadcast protocols were shown to be undecidable in 
IKKMhhL so decidability of model-checking certainly does not extend to liveness 
properties for the whole class we characterise. 

Related Work. Model checking for parameterized systems has been addressed by 
many researchers. One line of research is concerned with restrictions that, when 
imposed on parameterized systems, make model checking decidable for safety 
properties or even full temporal logic. The systems considered there are either 
token rings, like in lEN95l or broadcast protocols, which were introduced in lENQSl 
and also considered in lEFMDfll deals with a restricted form of broadcast 

protocols. Both types are subsumed by our subclass. 

The approach for broadcast systems in IEN98I and IETM99I falls under the 
paradigm of using well-ordered sets for the verification of infinite-state systems 
lACTlW All sets that are considered are sets that are upward closed with respect 
to a well order. This means that all guards of transitions have to describe upward 
closed sets, which excludes certain global conditions, e.g. the one used in the 
cache coherence protocol that Delzanno considered in iTCTini 

Regular model checking, advocated by e.g. lKMM+97l a,nd lBWM is based 
on representing sets of states by regular languages. Termination is obtained by 
applying some form of acceleration in order to compute the transitive closure of 
the transition relation fH.I N'TOOl Emphasis lies not on detecting decidable classes 
but providing general, not necessarily exact methods to handle a large class 
of systems. In this context, handling global guards was considered in IAB.IMUHI 
our method provably terminates on the examples considered there. The main 
difference is that is our work is not based on acceleration techniques. 

The specification language we use is rather general: The properties considered 
by EM had restricted number of quantified variables, and the properties for 
broadcast protocols considered in lEMhSIIEkMbfll can only express upward closed 
properties about numbers of processes being in a certain control state. 

Overview. First we explain the types of parameterized systems we consider. The 
third section contains our model checking algorithm for ordinary systems, while 
in the fourth section it is adapted to parameterized systems, and the conditions 
under which the algorithm terminates are given. Missing proofs and details can 
be found in the full version at http://www.dcs.ed.ac.uk/~morLika 

2 Framework 

As program notation we use concurrent state-based guarded-command systems; 
a system is hence of form (V, Ci, . . . , C„, /), where F is a set of variables, Ci are 
components and / is a predicate describing the initial states. A component con- 
sists of a set of transitions, where guards and assignments are built over boolean 
or enumerative variables or integer terms. More precisely, the terms occurring on 
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the right-hand side of assignments are terms of Presburger arithmetic, enumer- 
ative constants or formulas of Presburger arithmeticQ, depending on the sort of 
the left-hand side of the assignment, and guards are formulas of Presburger arith- 
metic. The restriction to Presburger arithmetic, i.e. to multiplication only with 
constants, guarantees decidability. A step is defined by choosing some compo- 
nent and one of its transitions with a guard that is satisfied in the current state, 
and performing all assignments of this transition simultaneously. As an example 
of the program notation, consider Tabled the well-known Bakery algorithm, in 
a version for 2 components. 

Table 1. Program Text for the 2-Component Bakery Algorithm. 



V: Cl, C2 : {r, IT, C}, ni,ri2 : NAT 
I: Cl = T A C2 = r A ni = 0 A 712 = 0 



Component 1: 



Component 2: 



Cl =T 



Cl = C 



'ci := IT 

^ Til := max{ni, 712 ) -I- 1 

Cl = IT A 
(t12 = 0 V Til < 712 ) 

'ci-.= T 
ni := 0 



(ci :=C) 



C2=T 



C2=C 



'C2-=W 

712 := max{ni, 712) -I- 1 

C2 = IT A 
( 7 I 1 = 0 V 712 < Til) 

' C2 := r 
712 := 0 



(C2:=C) 



2.1 Parameterized Systems 

Before describing our notion of parameterized systems, we first have to define 

their state language. 

Definition 1 (Index Predicates). Index terms are of form j k or k, where 

j is an index variable and k is an integer constant. 

Index predicates are defined as follows: 

— Basic index predicates are the formulas of Presburger arithmetic without 
quantificatioi}^, where variables can (but need not) be indexed by index terms. 

— ITe say that the index term j k occurs in the index predicate p if there is 
some variable y occurring in p in form y[j k]. 

— If p is a basic index predicate and ji, . . . ,jn index variables s.t. all index 
terms occurring in p that contain some ji have constant 0, then 

Vji . . .Vj„ (a ^ p) and 3ji ■ ■ (a Ap) 
are index predicates, where a is a conjunction of expressions of form ji yf k 
or ji ^ j -\- k for an index variable j . 

— Index predicates are closed under boolean operations. 

^ For simplicity, we consider boolean variables and equations between enumerative 
variables also to be formulas of Presburger arithmetic. 

^ Since the system variables are intended to have finite domain, quantification over 
integer variables would not add expressive power. 
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The restriction to unnested quantification is used in the proof of Theorem ^ 
while the restriction that a quantified variable j can only occur in index terms 
without constants is necessary for termination of the model checking algorithm, 
more precisely to guarantee that no new index terms are generated by instanti- 
ating quantifiers. 

Models of index predieates with respeet to n consist of a valuation v for the 
occurring index variables in — 1} and of a valuation s for the sys- 

tem variables, where for every indexed system variable y, s defines values for 
y[0], . . . , — 1]. We write s, v \=n p if s and v form a model for p with respect 

to n. 

A parameterized system S = {V,C[i],I) differs from an ordinary system 
in that the transitions of C[i] are parameterized by the index variable iti Accord- 
ingly, some variables of V are indexed, while the others act as global variables. 
The guards of an parameterizable eomponent C[i], are index predicates, but we 
do not allowed quantification over index variables on the right-hand side of as- 
signments. This guarantees that the predicates generated during model checking 
remain in the class of indexed predicates. The only index variable appearing 
freely in transitions of C[i] is i, and on the left-hand side of an assignment we 
only allow i as index term, without constant, which means that a copy can only 
modify its own variables. This not necessary but simplifies the computation of 
weakest preconditions. The initial predicate / is a closed index predicate. A pa- 
rameterized version of the Bakery algorithm, shown in Table El should illustrate 
the notion of parameterized system. 



Table 2. Parameterized Bakery Algorithm. 



V : 
I : 

c[i] 

c[i] 

c[i] 



c\i] : {T,W,C}, n\i] : INT 
V i {n[i] = 0 A c[i] = T) 



= T 



c[i] := W 

n[i] := {maxjn[j]) + 1 



= WA\/j{jf=i^ n[i\ < n[j] V n[j] =0) — > c[i] 
\n[i] :=o/ 






For a natural number n, the instantiation S'[n] of a parameterized sys- 
tem S is (P[n], C[0], . . . , C[n—l],I[n]), where V[n] is the set of ordinary variables 
of V together with, for every indexable variable y in P, y[0], . . . ,y[n — 1], and 
where for a natural number h, C[h] is obtained by replacing ihy h in all indexed 
expressions. All expressions are intended to be modulo n. This can be done on 

® The proposed approach is applicable to systems composed of different parameteri- 
zable components and of ordinary components. For simplicity of the presentation, 
this generalization is omitted. 
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the syntactic level by replacing all terms and predicates X hy X mod n as fol- 
lows: For a variable y and an index term i + k, y[i + k] mod n is y[{i + k) mod n]; 
this is extended in the usual way to terms and basic predicates. For quanti- 
fied predicates, we define Vj (a ^ p) mod n to be Ao<h<n(® mod n — > p[j := 
h] mod n) and 3j {a /\ p) mod n to be Vo<?i<n(“ mod n A p[j := h] mod n). 

The interpretation of (i + k) mod n under a given valuation v is the natural 
one. Note that the instantiation of a parameterized system is an ordinary system 
since i is the only free index variable occurring in C[i\. 

The following decidability result is crucial for the model checking proce- 
dure we present. For any index predicate p and any n, satisfiability of p mod n 
is decidable since the domains of the free index variables can be restricted to 
{0,...,n— 1}. Decidability holds since it suffices to check satisfiability for some 
large enough n, more precisely for n so that all of the index terms and all of the 
existentially quantified variables can be interpreted with different values. This 
argument does not hold for nested quantification, since if variable j is existen- 
tially quantified in the scope of a universal quantification over i, then for every 
value for i the possibility of a new value for j has to be considered, but this new 
value for j must be taken into consideration as value for z, which results in a 
cycle. 

Theorem 1. Satisfiability of index predicates is decidable. 

In a specification logic for parameterized systems, it is desirable to be able 
to quantify over index variables. For example, the property of mutual exclusion 
in the Bakery example can be formulated as: Vzi Vz 2 (zi 7 ^ 12 —*’ G{c[ii] yf C V 
c[j 2 ] ^ C)). More generally, we consider formulas of the form: Vji • • ■ VjVi (a ^ 
(/)), where 4> is an LTL formula with index predicates as state formulas, where 
ji, . . . , jn are the free index variables occurring in (f and where a is a conjunction 
of inequalities ji yf [i+]k for some index variable i. In IMaiOl I we specified a 
fragment of LTL for which it is possible to allow universal quantification over 
index variables anywhere in the formula. 

Now we can formally state the model-checking problem for parameter- 
ized systems: 

s ^ Vji • • -yjm (a ^ (f) if f. a. n, S'[n],s |=„ (Vji • • -Vj™ <P) mod n. 



2.2 Types of Parameterized Systems 

The characterisation we give of parameterized systems for which model checking 
is decidable concerns mainly the communication between different copies. A 
possible way of communication is to read the value of a variable of a different 
copy. This is the case if on the right-hand side of an assignment or in a guard, an 
expression y[i + k] occurs, where k is not zero, and hence the transition depends 
of the value of the variable y in the /c-th neighbour. An example for this type 
of communication is the following, a transition from the Dining Philosophers 
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algorithm. 

c[i] = h A pr[i] A ^pr[i — 1] A c[i — 1] ^ e A c[i + 1] ^ e 

While this general form of communication has to be restricted severely (see 
Definition 0) , other forms of communication, namely that of token rings and 
broadcast systems are unproblematic. 

Both token rings and broadcast systems have a control-state, i.e. there is 
a special variable c of enumerative sort V control such that the guard of every 
transition is of form c[i] = d A p for some index predicate p, and contains an 
assignment c[i\ \= d' . 

A token ring (jtiINDSjl is a control-state component C[i] for which some 
transitions are marked by send and some by rec. In all transitions of C[i], only 
i occurs as index term, so a copy can only read the value of its own variables 
or that of global variables. With the execution of a rec transition, the copy 
acquires the token, while with a send transition, the token is passed to the right 
neighbour. We require that send and rec transitions alternate along every path 
through C[i\. Token rings are not executed in a completely asynchronous fashion 
as the general parameterized systems we consider here: To execute send or rec 
transitions, neighbouring copies have to synchronise: For any instantiation with 
n and some 0 < h < n, a, transition in C[h] marked by rec can only be executed 
in parallel with a send transition in C[h — 1] (or C[n — 1] if = 0 resp.), and 
vice versa. By 'Dtoken we denote the set of control states in a token ring in which 
the copy “has the token”, i.e. which are reachable by a sequence transitions such 
that the last marked transition was a rec-transition. 

Table 3. Illinois Protocol. 



/c[z]:=e \ 

\pr[i] := false / 



V : c[i\ : {invalid, exclusive, dirty, shared} 

I : \/ i (c[i] = invalid) 

{readW), {writeW), (rep!!) c[i\ = invalid — > ^ c[i] = invalid^ 

(read??) = invalid ^ \ 

\ “ip — > c[i] := exclusive 



p abbreviates the global guard: 3 J j i A c[j] 7 ^ invalid) 



A broadcast component ljFN98ll is again a control-state component C[i], 
and communication between copies is only possible in form of synchronisation. 
But the copy to synchronise with is not determined but can be any other copy 
that can execute a matching transition. Moreover, a broadcast synchronisation 
is possible: In such a synchronisation step, all copies execute a transition. More 
formally, let S = be an action alphabet, where Em and Ehc are disjoint 

finite sets. The transitions can be marked by a! or a? for a G E^v, or by a!! or 
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a?? for a G Sbc- The semantics of broadcast systems is as follows: A transition 
marked by a? in some component can only be executed simultaneously with a 
transition marked by a! in some other component and vice versa. The broadcast 
actions a!! can only be executed simultaneously with an action marked by all 
in all other components, and an action marked by all can only be executed in 
such a situation. 

The cache coherence protocol used as example in ITCTTni can be modelled as 
broadcast protocol as partly shown in Table |3 and provides an example of a 
broadcast protocol with global guards. 

3 Model Checking by Tree Construction 

Model checking of safety linear temporal logic formulas can be restricted to 
model checking of formulas of form EF p for some state predicate p by using 
an automaton that accepts the bad prefixes for a safety linear temporal logic 
formula (IK V 991) . The fixed point approach for verifying that S satisfies EF q 
uses the fact that the set of states satisfying EF q is the least fixed point of 
the functional: F{X) = {s | s |= g} U ^wp.S.^X^ We carry out the fixpoint 
computation on predicates, i.e. we compute a state predicate which characterises 
the least fixed point of F, starting with the empty predicate false. The necessary 
ingredients needed to do this are: 

— Decidability of implication and satisfiability for state predicates. The former 
is needed to decide whether a fixed point has been reached, and the latter 
for deciding whether an initial state satisfies an expression. 

— For a given state predicate p, a state predicate representing the weakest 
precondition 0 of the set of states satisfying p must be computable. 

For a program S and a given state predicate p, a predicate wp.S.p that rep- 
resents wp.S.{s I s 1= p} is easy to define: Let Tr{C) be the set of all transitions 
of component C, where a transition is of form g — >< vg '■= tg, ... ,Vk ■= tk >, 

then wp.S.p = Ac comp, of S = Ac comp, of S Ag y<vo'^to,...,Vk'=tk>&Tr{G)^9 

p[vg := tg,...,Vk '.= tfe]), where [uq := tg,...,Vk ■= tk] denotes simultaneous 
substitution. 

The specific feature of our algorithm is that the fixed point computation is 
represented in form of a tree over IN, i.e. as set of finite sequences over IN that 
is closed under prefix-formation: The root is denoted by e, and e. g. 00 is the 
first successor of the first successor of the root. Note that the algorithm works 
directly on the program notation. 

Definition 2 (Proof Tree Construction). Let q be a predicate of the form 
AiPi foT literals pi. The proof tree T(EF q, S) is a tree over IN with labelling 
I : T{EF g, S) — > { AiPi I Pi literals } such that 

^ The set -iwp.S.^X consists of all states that have at least on successor in X. 

® For a system S and a set of states X, the weakest precondition wp.S.X is the set of 
states s of 5 so that all successors of s lie in X. 
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~ l{e) = q and 

- \/j ^ -^wp.S.^l{x)^ 

A node x € T{EF q, S) is a leaf if one of the following conditions holds: 

(i) l{x) is not satisfiable (unsuccessful leaf); 

(ii) I Al{x) is satisfiable (where I is the initial predicate of the system), i.e. there 
is an initial state of S satisfying l{x) (successful leaf); 

(iii) there is a node y G T{EF q,S) such that |x| > \y\ and l{x) l{y) (unsuc- 
cessful leaf). 

The tree T{EF q,S) is successful if it has a successful leaf. The following 
theorem states the correctness of the algorithm. 

Theorem 2. (a) S' ^ AG {^q) if and only ifT{EF q, S) is not successful. 

(b) If all sorts of variables of S are finite, then the construction of T{EF q, S) 
terminates. 

To illustrate the tree construction, the first steps of the proof tree construc- 
tion for the 2-component Bakery algorithm are shown in FigureOl For the root, 
^wp.S.^{c\ = C A C2 = C) is (ci = VF A {n\ < n2 V U2 = 0) A C2 = C) V (ci = 
C A (ri2 < ni V rii = 0) A C2 = W), and by transforming this into disjunctive 
normal form, the four successors of the root node are obtained. 



C\ = C A C2 = C 



Cl = W An\ < U 2 
A C2 = C 



Cl = T A 

max{ni, ri 2 ) -t 1 < ri 2 
A C2 = C 
unsat. 



ci = W 
A ni < 712 

A 712 < Til 

AC2 =W 
unsat. 



Cl = W 



Cl =C 



Cl =C 



A 712 = 0 A ?12 < Til A Til = 0 

A C2 ~ C A C2 = IT A C2 = IT 



Cl = IT 
A 111 < ?12 
A Til = 0 
A C2 = IT 



Fig. 1. Proof Tree for the 2-Component Bakery Algorithm. 

An advantage of using a tree structure to represent the least fixed point is 
that the predicates labelling the nodes are relatively small and automatically 
only elements of the frontier set are considered for the next iteration step. This 
distinguishes the approach from other approaches to use Presburger arithmetic 
for representing sets of states during the fixed point iteration IHCPhTI and makes 
it easier to check the Conditions (i), (ii) and (iii). Producing counterexamples is 
easy since the paths in the tree form potential refutation sequences. 



It is irrelevant for the correctness which representation in disjunctive normal form 
is chosen. 
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4 Model Checking for Parameterized Systems 

4.1 Adaption of the Tree Construction to Parameterized Systems 

The first of the requirements on page 13 1 71 for the proof tree construction is 
guaranteed by LemmaE It remains to explain the computation of -^wp.S.^p for 
a parameterized system S. For an ordinary parameterized system S and a given 
instantiation with n, and for an index predicate p without free index variables, 
wp.S[n].p can be define as on pageliLiJ This easily generalizes to token rings or 
a broadcast systems: All possibilities of synchronisation have to be considered. 

For the computation of wp.C[t].p, where C[i] is the component of S, t is an 
index term, and p is an index predicate, it has to be clear which indexed variables 
are modified by a transition of C[t\. So if p contains other index terms t' , either 
t^t'ovt = t' has to be assumed to be able to compute wp.C[t].p. The obvious 
solution is to consider all possibilities of dividing the index terms in p in those 
that are equal to t and those that are not equal to t, and to compute wp.C[t].p 
for each of those possibilities. To make this precise, we need some notation. Let 
ti, . . . , be the free index terms occurring in p. For an ordinary parameterized 
system, we define 

- G{p) = I {ii,.. . ,ifc} C {!,. . .,n}}, where 

- Equ{ti-^ , A ■■■ Ati^ = ti^ A ArG{l,...,n}\{ii,...,»fc} 

- Equ{(D) = Al<r<n** 7 ^ tr, where i* is a new index variable. 

For an element g € G{p), let rep{g) be or i* respectively. For token rings, in 
addition sets of index terms that equal rep{g) + l or rep{g) — l have to be selected, 
and for broadcast systems, a set of index terms standing for the component that 
is chosen for synchronisation has to be chosen. 

For g e G{p), hy p{g) we denote the result of replacing all index terms that are 
by g equal to rep{g) by rep{g) throughout p, and by replacing quantified subex- 
pressions as follows: (Vj (a — > p)){g) = (a — > p)[j := rep{g)]{g) A Vj ((a A j yf 
rep{g) — *■ p){g)), and the definition for existential quantification is accordingly, 
using disjunction. 

By wp^'^'^'P .C[t].p we denote the result of computing the weakest precondition 
of p (e.g. by the formula given on na.ge 13 1 Til by considering variables with index 
terms syntactically different from t to be not modifiable by G\t]. The following 
lemma states that wp.S[n].p can be represented by computing for all 

elements of G(p). Note that p[{j := v{j) \ j free index variable in p}] mod n 
does not contain free index variables and can hence be considered as ordinary 
predicate over the system variables of «S'[n]. 

Lemma 1. For all n, states s and valuations v with respect to n: 

For all valuations w of the index variables introduced in G{p ) , 

s,v + w\=n (AgGG(p)5 ^ wp^^'^P .C[rep{g)].p)g)) modn 

s,u \=n wp.S[n].(p[{j := v{j) \ j free index variable in p}] modn). 



^ Note that i + k = i + k' can be satisfiable in some instantiation for even ii k k' . 
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Lemma Q implies that for parameterized systems, a proof tree can be con- 
structed generically for all instantiations ^[n] by computing 

-nwp.S.^q= \f {g A^wp'^’'"^P.C[rep{g)].^q{g)). 

g&G(q) 

The key property is the following: There is some n and s, v with respect to n, and 
a valuation w for the new index variables in G{q) such that s, v+w \=n -^wp.S.^q, 
iff there is a successor s' of s in S'[n] such that s',v \=n ~^q- 

So we can now define the adaption of the proof tree construction for param- 
eterized systems. Note that there is only one additional termination conditions, 
which only applies to token rings. 

Definition 3 (Proof Tree Construction for Parameterized Systems). 

Let p he an index predieate. p, S) is a tree over TN with labelling /\^Pi, 

where pi is either a quantifier-free index predicate which is a literal, or a quan- 
tified index predicate. 

— l{e) = p and 

- y . l{xj) = -^wp.S.^l(x). 

A node x G B^°‘''°‘{EFp, S) is a leaf if one of the following holds: 

(i) l{x) is not satisfiable (unsuccessful leaf); 

(ii) I A l{x) is satisfiable (successful leaf); 

(iii) there is a node y G B‘p°‘''°‘{EF q, S) such that \x\ > \y\ and 3l{x) ^ 31 (y), 
where 3_ denotes existential quantification over all free index variables (un- 
successful leaf). 

(iv) For a token ring component C[i] : Let i -\- k\, . . . ,i -\- kr be all index terms 
containing i that occur in p (note that in C[i], only i occurs as index term) 
and let k\ and kr he maximal resp. minimal among {ki...,kr}. Then all 
labels containing i -\- k,. — 2 as index term, are leaves (unsuccessful leaf). 
Furthermore, all labels that have more than one token are leaves, i. e. those 
containing literals of form c\t] = d and c[t'] = d' for d,d' G D token and for 
two index terms t and t' such that t ^ t' , where c is the control-state variable 
of C[i] (unsuccessful leaf ). 

Note that the existential quantification in Condition (iii) can be applied since 
/ is a closed index predicate. The reason why Condition (iv) is correct is that due 
to the restricted communication, if a label contains an expression P[i -\- kr — k] 
that contains i -\- kr — k as index term for some k > 2, then there also is a node 
that does not differ in the expressions that contain i -\- ki, . . . ,i -\- kr as index 
terms and besides that only contains P[i -\- kr — 2]. It follows that if the former 
node is satisfiable, then already the latter is satisfiable, under the additional 
requirement that for token rings, the initial state is uniform in all indices execpt 
0. Furthermore, due to the synchronisation, labels that contain index terms 
i -\- ki -\- k for some k > 1 necessarily contain several tokens. 
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ji i- h A c[ji] = C A c[j 2 ] = C 



ji ^ J 2 A c[ii] = C 
AV j (j 7^ j2 

{n[ji] < n[j] V n{j] = 0)) 
A c[j2] — W 
implies 1 



ji j 2 A c[ji] = W 
AV j {j 7^ ji Aj ^ j2 
{n{ji] < n[j] V n[j] = 0)) 
A n[j2] = OA 
V j U 7^ j2 

{n[j 2 ] < n[j] V n[j] = 0)) 
A c[j2] = W 



ji 7 ^ j 2 A c[ji] = w 
AV j O' 7^ ii ^ 
(n[ji] < n{j] V n[j] = 0)) 
A c[j2\ = C 



ji 7^ A ii 7^ ii A ii 7^ J2 
A c[ji] = C A c[j 2 ] = C 
new i \ , impl. e 




ji + j 2 A c[ji] = T 
AVi 0 /ii ^ 

((maa:fcn[fc]) + 1 < n[j] V n[j] = 0)) 
A c[j2] = C 



ji ^ J 2 A c[ji] = T 
AV j 0 / ji A j 7^ j2 ^ nb'] = 0) 
A n[j2] = 0 
AV j 0 J2 ^ 
in[j 2 ] < n[j] V nbl = 0)) 

A cb' 2 ] = W 
/ 

jl 7^ j 2 A cb'l] = c 
AV j 0 / jl A i / J2 ^ nb] = 0) 

A n[j2] = OA 
V j 0 7^ Aj ^ jl 

{n[j 2 ] < n[j] V nbl = 0)) 

A cb' 2 ] = W 



jl + j 2 A cb'l] = c 
AVi 0 /ii ^ n[j] = 0) 
A cb' 2 ] = c 

implies e 



false 



implies 10 



j'l ^ J 2 A cb'l] = C 
Vi O' 7^ ii A i 7^ i2 ^ n]i] = o) 

A {maxkn[k]) + 1 = 0 
A cb' 2 ] = T 
false 



Fig. 2. Proof Tree for the Parameterized Bakery Algorithm. 



Figure Elgives an example for the proof tree construction for a parameterized 
system. The first successor of the root represents the condition that has to hold 
if Cb' 2 ] takes a step, the second condition is the case that Cb'i] takes a step, and 
the third successor is the dual of the weakest precondition for the case that C[fi], 
where i\ is a new variable, takes a step. Since this is obviously only necessary if 
there are global variables (which could be modified by C[fi]), this step is omitted 
in the rest of the tree. Only parts of the tree are displayed, while the full tree 
has 31 nodes. The full paper also discusses the proof tree construction for the 
Illinois protocol. 



322 



Monika Maidl 



4.2 Parameterized Systems for which Model Checking of Safety 
Properties Is Decidable 

By exploring under which conditions the proof tree construction terminates, we 
obtain a characterisation of parameterized systems for which model checking of 
safety properties is decidable. The main observation is that Condition (iii) holds 
eventually along a path of the proof tree if for a given index variable j, only 
finitely many index terms of form j + k occur in labels of the proof tree. This 
holds for broadcast systems and token rings (for the latter due to Condition (iv)), 
but not for parameterized systems C[i] in which expressions of form i+k for fc yf 0 
occur in guards of in right-hand sides of assignments as in the example transition 
on page 13 1 dl The following restriction however guarantees this property. 

Definition 4 (Component with bounded communication). A component 
C[i] has unbounded communication if there is a sequence of indexed variables 
yo, j/i, . . . , such that yo = yn, and for all j < n, there is a transition tr ofC[i] 
that contains an assignment yj[i] '■= t in C[i] such that for some integer constant 
kj, yj+i[i + kj] occurs in guard(tr) or in t, and kj yf 0 for at least one j. 

We can now define the subclass of parameterized systems for which our model 
checking algorithm terminates: 

Definition 5 (Terminating parameterized system). C[i] is a terminating 
parameterized component if 

1. it does not have unbounded communication, or 

2. it is a token ring, where the initial predicate must be uniform for all indices 
except the index 0, 

3. it is a broadcast component. 

The proof tree construction terminates for terminating parameterized sys- 
tems if the domains of system variables are finite: Condition (iii) must apply 
eventually along a path of the proof tree: A label is determined by choosing a 
set of literals and a set of quantified expressions, and a set of index terms that 
occur in the label. There are two types of index variables, those occurring in the 
property (only finitely many) and those introduced by G{p). The number of the 
latter is not bounded but the number of different such index variables within 
one label is bounded by the maximal iteration depth of quantifiers occurring in 
the property or in some guard plus 1. As argued above, the number of differ- 
ent index variables also bounds the number of different index terms. So up to 
equivalence after existentially quantifying over index variables (which is used in 
Condition (iii)), there are only finitely many possibilities for labels. 

We can summarise the properties of the extended proof tree construction in 
the following theorem. By using an automaton representing bad prefixes for ip, 
this extends to formulas Vji . . .Vjm (a ^ ’f) with if an LTL safety-formula. 

Theorem 3. (a) S ^ Vji . . .Vjm (a ^ AG(^p)) iff p, S) is not suc- 

cessful. 

(b) If all system variables of S have finite domains, then the construction of 
p, S) terminates. 
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Abstract. We present a model-checker for boolean programs with (pos- 
sibly recursive) procedures and the temporal logic LTL. The checker is 
guaranteed to terminate even for (usually faulty) programs in which the 
depth of the recursion is not bounded. The algorithm uses automata to 
finitely represent possibly infinite sets of stack contents and BDDs to 
compactly represent finite sets of values of boolean variables. We illus- 
trate the checker on some examples and compare it with the Bebop tool 
of Ball and Rajamani. 



1 Introduction 

Boolean programs are C programs in which all variables and parameters (call-by 
value) have boolean type, and which may contain procedures with recursion. In 
a series of papers, Ball and Rajamani have convincingly argued that they are a 
good starting point for investigating model checking of software m- 

Ball and Rajamani have also developed Bebop, a tool for reachability analysis 
in boolean programs. As part of the SLAM toolkit. Bebop has been successfully 
used to validate critical safety properties of device drivers 0. Bebop can de- 
termine if a point of a boolean program can be reached along some execution 
path. Using an automata-theoretic approach it is easy to extend Bebop to a tool 
for safety properties. However, it cannot deal with liveness or fairness proper- 
ties requiring to examine the infinite executions of the program. In particular, 
it cannot be used to prove termination. 

In this paper we overcome this limitation by presenting a model-checker 
for boolean programs and arbitrary LTL-properties. The input to the model 
checker are symbolic pushdown systems (SPDS), a compact representation of 
the pushdown systems studied in A translation of boolean programs into 
this model is straightforward. The checker is based on the efficient algorithms 
for model checking ordinary pushdown systems (PDS) of P|. While SPDSs have 
the same expressive power as PDSs, they can be exponentially more compact. 
(Essentially, the translation works by expanding the set of control states with 
all the possible values of the boolean variables.) Therefore, translating SPDSs 
into PDSs and then applying the algorithms of 0| is very inefficient. We follow 
a different path: We provide symbolic versions of the algorithms of 0 working 
on SPDSs, and use BDDs to succintly encode sets of (tuples of) values of the 
boolean variables. 



G. Berry, H. Comon, and A. Finkel (Eds.): CAV 2001, LNCS 2102, pp. 324-ESSI 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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This paper (and its full version pj) contribute symbolic versions of the al- 
gorithms of tuned to minimise the number of required BDD variables; an 
efficient implementation including three heuristic improvements; some experi- 
mental results on different versions of Quicksort; and, finally, a performance 
comparison with Bebop using an example of Q ■ 

The paper is structured as follows. PDSs and SPDSs are introduced in Sec- 
tion 13 The symbolic versions of the algorithms of 0 are presented in Section 0 
and their complexity is analysed. In particular, we analyse the complexity in 
terms of the number of global and local variables. In Section 0 we discuss the 
improvements in the checker and present our results on verifying Quicksort; in 
particular we analyse the impact of the improvements. Section El contains the 
comparison with Bebop, and Section 0 contains conclusions. 

2 Basic Definitions 

In this section we briefly introduce the notions of pushdown systems and linear 
time logic, and establish our notations for them. 

2.1 Pushdown Systems 

We mostly follow the notation of 0. A pushdown system is a four-tuple V — 
(P, P, Co, A) where P is a finite set of control locations^ P is a finite stack alphabet, 
and A C (Px P) x (PxP*) is a finite set of transition rules. If (( 9 , 7 ), (<?', w)) S A 
then we write {q, 7 ) {q', w). A configuration of P is a pair (p, w) where p G P 

is a control location and w G P* is a stack content, cq is called the initial 
configuration of P. The set of all configurations is denoted by C. 

If {q, 7 ) ^ (g', w), then for every v G P* the configuration (g, jv) is an imme- 
diate predecessor of (q',wv), and (q',wv) an immediate successor of (q,jv). The 
reachability relation =k is the reflexive and transitive closure of the immediate 
successor relation. A run of P is a sequence of configurations such that for each 
two consecutive configurations CiCi+i, Ci+\ is an immediate successor of c^. 

The predecessor function pre : 2*^ ^ 2*^ of P is defined as follows: c belongs to 
pre{C) if some immediate successor of c belongs to C. The reflexive and transitive 
closure of pre is denoted by pre* . We define post{C) and post* similarly. 

In the next section, when we model boolean programs as pushdown systems, 
we will see that it is natural to consider a product form for P and G. More 
precisely, it is convenient to introduce sets Pq and G such that P = Pq x G, and 
sets Pq and L such that G = Pq x L. G and L are called sets of global and local 
values, since they are, loosely speaking, the possible valuations of the global and 
local variables of the program, respectively. So, for the rest of the paper, we 
assume 



P = PqxG and G = Pq x L . 

A symbolic pushdown system is a pushdown system in which sets of transition 
rules are represented by symbolic transition rules. Formally, a symbolic pushdown 
system is a tuple Vs = {P, P,co, As), where As is a set of symbolic transition 
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rules of the form (p, 7) " ^ > (p',71 . . -7n), and R C {G x L) x {G x L”) is a 
relation. A symbolic pushdown system corresponds to a normal pushdown system 
{Pq X G,PoX L, Co, A) in the sense that a symbolic rule {p, 7) ‘ - > (p', 71 . . . 7„) 
denotes a set of transition rules as follows: 

if {9, 1 , 9', h,---, In) e R, then ((p, p), (7, Z)) ((p', p')> (7i, ^1) • ■ • (7n, ^n)) e ^ 

In practice, R should have an efficient symbolic representation. In our applica- 
tions we have G = { 0 , 1 }" and L = { 0 , 1 }"* for some n and m, and so R can be 
represented by a BDD. 

Given a pushdown system V = (P, P, cq, A), we use V -automata to represent 
regular sets of configurations of P. A P-automaton uses P as alphabet, and P 
as set of initial states. Formally, a P-automaton is a tuple A = {P,Q,S, P, F) 
where Q is a finite set of states, SCQxPxQ is a set of transitions, P is the 
set of initial states and F C Q is the set of final states. An automaton accepts 
or recognises a configuration (p, w) if p q for some p G P, q G F. The set of 
configurations recognised by an automaton A is denoted by Conf(A). A set of 
configurations of P is regular if it is recognized by some automaton. 

A symbolic Vs-automaton is a tuple As = (Pqi Qi ^s, Pq, F), where the sym- 
bolic transition relation is a function Ss- {Q x Fq x Q) (G x L x G). The 
relation Ss should be seen as the symbolic representation of the transition rela- 
tion S: Ss{q,^,q') is the set of all {g,l,g') such that {{q,g),{'^,l),{q' ,g')) G S. If 
RC (Gx LxG),we denote by q q' the set of transitions {{q, p), (7, 0 > ( 9 ^ 9')) 

R 

such that [g,l,g') G R. In the sequel, V -automata and symbolic Vs-automata 
are just called automata and symbolic automata, respectively. 

2.2 The Model-Checking Problem for LTL 

We briefly recall the results of | 3 | and Given a formula <p of LTL, the model- 
checking problem consists of deciding if cq violates p, that is whether there is 
some run starting at cq that violates p. The problem is solved in ^ using the 
automata-theoretic approach. First, a Biichi pushdown system is constructed as 
the product of the original pushdown system and a Biichi automaton B for the 
negation of p. This new pushdown system has a set of final control states. We 
define a new reachability relation AA with respect to this set; we write c d 
if d can be reached from c while visiting some final control state along the way. 
Now, define the head of a transition rule (p, 7) fp' ,w) as the configuration 
(p, 7) . A head (p, 7) is repeating if there exists v G F* such that (p, 7) (p, jv) 

holds. We denote the set of repeating heads by Rh. It is shown in ^ that the 
model-checking problem reduces to either checking whether cq G pre*{Rh F*), 
or, equivalently, checking whether posG({co}) H Rh P* 0 . Furthermore, it is 
shown that the problem can be solved in ( 3 (|Pp |Z\| |;Bp) time and 0 {\P\ | A| \B\^) 
space. 

3 Modelling Programs as Symbolic Pushdown Systems 

Pushdown systems find a natural application in the analysis of sequential pro- 
grams with procedures (written in G or Java, for instance). We allow arbitrary 
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int x; 

void mainO { 
int z; 

f (x+z) ; 

} 

void f() { 
x=x+y; 

} 




Fig. 1. An Example Program (left) and the Associated Flowgraph (right). 



recursion, even mutual procedure calls, between procedures; however, we require 
that the data types used in the program be finite. In the following, we present 
informally how to derive a symbolic pushdown system from such a program. 

In a first step, we represent the program by a system of flow graphs, one 
for each procedure. The nodes of a flow graph correspond to control points in 
the procedure, and its edges are annotated with statements, e.g. assignments or 
calls to other procedures. Non-deterministic control flow is allowed and might 
for instance result from abstraction. Figure Q shows a small C program and the 
corresponding flow graphs. The procedure main ends in an infinite loop to ensure 
that all executions are infinite. In the example, a Unitary fragment of the type 
integer has to be chosen. 

Given such a system of flow graphs, we derive a pushdown system and a 
corresponding symbolic pushdown system. For simplicity, we assume that all 
procedures have the same local variables. The sets G and L contain all the 
possible valuations of the global and local variables, respectively. E.g., if the 
program contains three boolean global variables and each procedure has two 
boolean local variables, then we have G = {0, 1}^ and L = {0, 1}^. Pq contains 
one single element p, while F is the set of nodes of the flow graphs. 

Program statements are translated to pushdown rules of three types. 

Assignments. An assignment labelling a flow-graph edge from node rii to node ri 2 
is represented by a set of rules of the form 

{glob, (ni, loc)) ^ {glob' , {u 2 , loc')). 

where glob and glob' {loc and loc') are the values of the global (local) variables 
before and after the assigment. This set is represented by a symbolic rule of the 
form {p,ni) ^ > (p, 712 ), where R F {G x L) x {G x L). 

Procedure Calls. A procedure call labelling a flow-graph edge from node ni to 
node U 2 is translated into a set of rules with a right-hand side of length two 
according to the following scheme: 

{glob, {ni, loc)) ^ {glob' , {mo, loc') (ri 2 , loc")) 
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Here mo is the start node of the called procedure; loc denotes initial values of 
its local variables; loc” saves the local variables of the calling procedure. (No- 
tice that no stack symbol contains variables from different procedures; hence 
the size of the stack alphabet depends only on the largest number of local vari- 
ables in any procedure.) This set is represented by a symbolic rule of the form 
{p,ni) ^ > (p, mon 2 ), where R C {G x L) x {G x L x L). 

Return Statements. A return statement has an empty right side: 

{glob, (ji,loe)) ^ {glob',e) 

These rules correspond to a symbolic rule of the form (p, n) " — > (p, e), where 
i? C {GxL)xG. Procedures which return values can be simulated by introducing 
an additional global variable and assigning the return value to it. 

Notice that the size of the symbolic pushdown system may be exponentially 
smaller than the size of the pushdown system. This is the fact we exploit in 
order to make model-checking practically usable, at least for programs with few 
variables. Notice also that in the symbolic pushdown system we have |Po| = 1 
and Iq is the set of nodes of the flow graphs. 

Since a symbolic pushdown system is just a compact representation of an 
ordinary pushdown system, we continue to use the theory presented in ^ . In this 
paper we provide modified versions of the model-checking algorithms that take 
advantage of a more compact representation. In our experiments, we consider 
programs with boolean variables only and use HDDs to represent them. Integer 
variables with values from a finite range are simulated using multiple boolean 
variables. 



4 Algorithms 

According to Section 0 we can solve the model-checking problem by giving al- 
gorithms for the following three tasks: 

— to compute the set pre*{C) for a regular set of configurations C (which will 
be applied to C = RhF*) 

— to compute the set post*{C) for a regular set of configurations G (which will 
be applied to C = {cq}) 

— to compute the set of repeating heads Rh 

In ^ efficient implementations for these three problems were proposed for or- 
dinary pushdown systems. In this section, we sketch how the algorithms may 
be lifted to the case of symbolic pushdown systems. More detailed presentations 
are given in the full version of the paper . We fix a symbolic pushdown system 
V — {Po X G,Po X L, Co, As) for the rest of the section. 

4.1 Computing Predecessors 

Given a regular set G of configurations of V, we want to compute pre*{G). 
Let A be a "P-automaton that accepts G. We modify A to an automaton that 
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accepts pre*{C). The modification procedure adds only new transitions to A, 
but no new states are created. Without loss of generality, we assume that A has 
no transitions ending in an initial state. 

In ordinary pushdown systems, new transitions are added according to the 
following saturation rule: 



If (p, 7) (p', 71 . . . 7„) and p' qi ^ q 

in the current automaton, add a transition (p, 7,9). 



The correctness of the procedure was proved in [ 3 |. For the symbolic case, 
the corresponding rule becomes: 



If (p, 7) (p', 7172 . . . 7n) and p' ^ qi ^ ^ q in the 

current automaton, replace p <7 by p q where 

R" = ij' u { (g,l,gn) I {g,l,go,h, ...,ln)^R 

A 3pi,...,p„_i: VI < i < n: S Ri}. 



The computation of R" can be carried out using standard BDD operations. 
A detailed, efficient implementation of the procedure can be found in 



4.2 Computing the Repeating Heads 

For ordinary pushdown systems 0 we construct a directed graph whose nodes 
are the heads of the transition rules (and so elements of Px F). There is an edge 
from (p,7) to (p',7') iff there is a rule (p, 7) ^ {p" ,v\^'v2) where {p" ,v\) =k 
(p',e) holds. The edge has label 1 iff either p is an accepting Buchi state, or 
(p", v\) =k (p', e). The edges are computed using pre* . A head (p, 7) is repeating 
iff it belongs to a strongly connected component (SCC) containing a 1 -labelled 
edge. The SCCs are computed in linear time using Tarjan’s algorithm @. 

For symbolic pushdown systems we represent the graph compactly as a sym- 
bolic graph SG. The nodes of SG are elements of Pq x A, and its edges are 
annotated with a relation R C (G x L)^ (plus a boolean, which is easy to handle 
and is omitted in the following discussion for clarity). An edge (po, 70) -^(po, 7 q) 
stands for the set of edges {po, g,lo,l) g' such that {g,l,g',l') G R. 
Unfortunately, when R is symbolically represented Tarjan’s algorithm cannot be 
applied. A straightforward approach is to “saturate” SG instead according to 
the following two rules: 

- If (po,7o) -^(Po, 7 o) -^{PoAo), then add (po,7o) -^(Po, 7 o). where 

R"-.={{{g,l),{g'J')) I 3(5",/"): {{gj),{9",n) G PA((p",n,( 5 ',n) G R'}. 

- If (po, 7o) ~^{p'o, 7 o) and (po, 70) -^{p'o, 7 o)i then replace these two arcs by 
(Po,7o)-^^(Po,7o) 



The saturation procedure terminates when a fixpoint is reached. It is easy to 
see that this algorithm has complexity 0 {n ■ m) where n and m are the number 



330 Javier Esparza and Stefan Schwoon 

of nodes and edges of G. Using this method, the model-checking problem for 
symbolic systems has a worse asymptotic complexity than for normal systems. 

In practice, this disadvantage can be made up for, mainly due to the more 
succinct representation. Moreover, the straightforward approach can be replaced 
with more refined strategies that work better in practice (see the discussion in 
Section El ■ 

4.3 Computing Successors 

Given an automaton A accepting the set C, we modify it to an automaton 
accepting post*(C). Again we assume that A has no transitions leading to initial 
states, and moreover, that |w| < 2 holds for all rules (p, 7) ‘ - > {p',w). This is 
not an essential restriction, as all systems can be transformed into one of this 
form with only a linear increase in size. 

In the ordinary case, we allow e-moves in the automaton. We write ^7 for 
the relation (-^)*-^(-^)*. The algorithm works in two steps 0: 

— If {p, 7) {p' add a state (p', 7') and a transition (p', 7', (p', 7')). 

— Add new transitions to A according to the following saturation rules: 

If (p, 7) (p', e) G A and p^^q in the current automaton, 

add a transition (p', e, q). 

If (p, 7) {p' -:l') G A and p=^q in the current automaton, 

add a transition (p',7',5). 

If r = (p, 7) (p^7^7") G A and p^=>q in the current automaton, 

add a transition ((p, 7), 7", 9). 

For the symbolic case, the corresponding first step looks like this: For each 
symbolic rule (p, 7) ^ > {p',y'y'') we add a new state (p',7'). We must adjust 

the symbolic transition relation slightly for these new states; e.g. when q and q' 
are such states, then <5s(g,7, q') is a subset of {Gx L)x Lx {Gx L). Moreover, for 
each such rule we add a transition t = (p', y' , (p', y')) s.t. 6s{t) = { (5', I', {g' , I')) \ 
3{g,l,g' ,1' J”) G i?}. Concerning e-transitions, 6s{q,e,q') is a subset of G x G. 
In the second step, we proceed as follows: 

If (p, 7) ‘ ^ > (p^e) G As and p^^q in the current automaton, 
add to Ss{p', e, q) the set { (p', p") | 3(p, /, p') G R, (p, /, p") G i?' }. 

If (p, 7) ^ > (p^70 G As and p=^q in the current automaton, 

add to 6s{p', 7', q) the set { (p', I', g") \ 3(p, I, g' , V) G R, (p, /, g") G i?' }. 

If (p,7) {p',il") G As andp=^p, add to <5s((p', 7'), 7", ?) 

the set { ((p', I'), I”, p") I 3(p, I, g', V , V) G R, (p, I, p") G R ' }. 



In El we present an efficient implementation of these rules. 
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4.4 Complexity Analysis 

Let V = (P,r,co,A) be an ordinary pushdown system, and let be a Biichi 
automaton corresponding to the negation of an LTL formula ip. Then, according 
to 0, the model-checking problem for V and B can be solved in 0{\P\‘^ ■ |Z\| • |,Bp) 
time and 0{\P\ ■ |Z\| • \B\^) space. 

Consider a pushdown system representing a sequential program with proce- 
dures. Let n be the size of a program’s control flow, i.e. the number of statements. 
Let mi be the number of global (boolean) variables, and let m 2 be the maximum 
number of local (boolean) variables in any procedure. Assuming that the pro- 
grams use deterministic assignments to variables, each statement translates to 
2 mi+m 2 diff 0 i- 0 iit pushdown rules. Since the number of control locations is 2™% 
we would get an 0{n ■ . |iJ|3) time and 0{n ■ . \B\^) space 

algorithm by translating the program to an ordinary pushdown system. 

When we use symbolic system, the complexity gets worse. The graph SG has 
0(|A|) nodes and 0(1^1 • |A|) edges. So our symbolic algorithm for computing 
the SCCs has complexity 0(1^1 • |L\p). We therefore get 0{ii? ■ 23 "*i+ 2 "i 2 . \B\^) 
time in the symbolic case. (The space complexity remains the same.) However, 
as mentioned before, the more compact representation in the symbolic case com- 
pensates for this disadvantage in the examples we tried. 

5 Efficient Implementation 

We have implemented the algorithms of Section El in a model-checking tool. 
Three refinements with respect to the abstract description of the algorithms are 
essential for efficiency. 

Procedure for the Model- Checking Problem. As mentioned in section tZ.'A the 
model-checking problem reduces to (a) checking whether cq € pre*{Rh P*), or 
(b) checking whether post*({co}) H Rh P* ^ 0. In order to compute (b) symbol- 
ically, we first compute the reachable configurations (i.e., post*({co})). Then, in 
each symbolic rule (p, 7 ) ‘ — > (p', 71 . . . 7 „) we replace i? by a new relation Rreach 
defined as follows: (p, /, g'Ji, . . . In) G Rreach if (p, I, g' ,li, ■ ■ ■ In) G R and some 
configuration {{p,g), ( 7 , l)w) is reachable from cq. This dramatically reduces the 
efforts needed for some computations if the number of reachable variable val- 
uations is much smaller than the number of possible valuations. In this case, 
much of the work in (a) would be spent on finding cycles among unreachable 
valuations. 

Efficient Computation of the Repeating Heads. As mentioned in section F4.2I the 
computation of the repeating heads reduces to determining the SCCs of a graph 
symbolically represented as a labelled graph SG. The nodes of SG are elements 
of Pq X Pq, and its edges are annotated with a relation i? C (G x L)^ (and a 
boolean). In our implementation, we first compute the components “roughly”, 
i.e., ignoring the Rs in the edges, using Tarjan’s algorithm. Then we refine the 
search (including the Rs) within the components. For this problem a number 
of different approaches could be tried. The algorithm of Section n~2l corresponds 
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to computing the transitive closure of the edges. The transitive closure can be 
computed using a stepwise computation or iterative squaring (see also [Z|); the 
stepwise method seems to work better in general. Xie and Beerel PH suggest a 
more sophisticated approach for searching the components in a symbolic setting. 
Moreover, these possibilities can be combined with a preprocessing of the edge 
relation. The preprocessing looks for BDD variables that can change their values 
from only 0 to 1 (or vice versa), but not in the other direction and removes such 
edges for such variables, effectively limiting the length of the paths in the graph. 

Variable Ordering. It is well known that the performance of BDD-based algo- 
rithms is very sensitive to the variable ordering. When checking the Quicksort 
example (see below) we found that a useful variable ordering was to place the 
inputs (i.e. the array of data to be sorted) at the end and the ‘control variables’ 
(i.e. indices into the array) at the beginning. Our intuition for this is that every 
instruction changes at most two elements of the array, and that such changes 
can be described with small BDDs. So we need one such BDD for each of the 
(relatively few) possible valuations of the control variables. If the input data 
was placed at the beginning, the BDDs would first branch into the (relatively 
many) possible valuations of the input data. While it is difficult to make a gen- 
eral assessment of variable orderings, there is hope that this ordering would also 
be useful in other examples where the same division between inputs and con- 
trol variables can be made. Since the inputs are stored in global variables, this 
criterion corresponds to placing the local variables before the global variables. 

In the rest of the section we give an idea of the performance of the algorithm 
by applying it to some versions of Quicksort. Then we show the impact of the 
three improvements listed above by presenting the running times when one of 
the improvements is switched off. All computations were carried out on an Ul- 
trasparc 60 with 1.5 GB memory. Operations on BDDs were implemented using 
the CUDD package |S|- 

5.1 Quicksort 

We intend to sort the global array a in ascending order; a call to the quicksort 
function in figureQ should sort the fragment of the array starting at index left 
and ending at index right. The program is parametrised by two variables: n, the 
number of bits used to represent the integer variables, and m, the number of array 
entries. We are interested in two properties: first, all executions of the program 
should terminate, and secondly, all of them should sort the array correctly. 

Termination. For this property we can abstract from the actual array contents 
and just regard the local variables. The program in figure El is faulty; it is not 
guaranteed to terminate (finding the fault is left as an exercise to the reader) . A 
corrected version (containing one more integer variable) is easy to obtain from 
the counterexample provided by our checker. Figure El lists some experimental 
results. For each n, we list the number of resulting local variables in terms of 
booleans. Since the array contents are abstracted away here, there are no global 
variables, and m does not play a role. 



A BDD-Based Model Checker for Recursive Programs 333 



void quicksort (int left,int right) 

{ 

int lo,hi,piv; 

if (left >= right) return; 
piv = a [right] ; lo = left; hi = right; 
while (lo <= hi) { 
if (a [hi] > piv) { 
hi—; 

} else { 

swap a[lo],a[hi]; 
lo++ ; 

} 

} 

quicksort (left , hi) ; 
quicksort (lo, right) ; 

} 



n 


locals 


time 


memory 






faulty version 


3 


12 


0.14s 


4.6 M 


4 


16 


0.39 s 


5.3M 


5 


20 


1.37s 


7.2 M 


6 


24 


6.86 s 


10.5 M 


7 


28 


53 s 


12. 3M 


8 


32 


592 s 


14.6 M 


9 


36 


> 3600 s 


- 




corrected version 


3 


15 


0.22 s 


4.8 M 


4 


20 


0.67s 


6.1M 


5 


25 


3.63 s 


9.4 M 


6 


30 


48.67 s 


14.7M 


7 


35 


1238 s 


15.1 M 


8 


40 


> 3600 s 


- 



Fig. 2. Left: Faulty Version of Quicksort. Right: Results for Termination Check. 



n 


m 


globals 


locals 


normal 


randomised 


time 


memory 


time 


memory 


3 


4 


12 


18 


1 s 


7.2 M 


1 s 


8.0 M 


3 


5 


15 


18 


4s 


14.5 M 


8s 


15.2 M 


3 


6 


18 


18 


38 s 


22.3 M 


82 s 


29.9 M 


4 


4 


16 


24 


3s 


12. IM 


6s 


12. 3M 


4 


5 


20 


24 


24 s 


18.7M 


48 s 


25.1 M 


4 


6 


24 


24 


193 s 


77.4 M 


531s 


134 M 


4 


7 


28 


24 


1742 s 


414 M 


>3600 s 


- 



Fig. 3. Results for Correctness of Sorting. 



Correctness of the Sorting. In this case we also need to model the array contents 
as global variables. Figure El lists the results for the corrected version of the 
algorithm in figure El £^s well as for a variant in which the pivot element is 
chosen randomly. 

Impact of the Improvements. Figure 0 shows the impact of the three improve- 
ments in the task of checking the correctness of Quicksort. We consider the 
non-randomised version with n = 3, and m = 4. The line NONE contains the 
reference values when all improvements are present. The lines VORD and PROC 
give the time and space consumption when the improvements concerning variable 
ordering and procedure for solving the model-checking problem are “switched 
off”, respectively. More precisely, in the VORD line we use a BDD ordering 
corresponding to the order left , right , lo ., hi ., piv (i.e. all BDD variables used for 
representing left before and after a program step come before those for repre- 
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NONE 1.02 s 7.2 M 



time memory 



with w/o 
preprocessing 



VORD 49 s 6.8 M closure 

PROG 624 s 60.6 M method of [T7H 



0.40 s 213 s 
35s 14s 



Fig. 4. Impact of the Improvements. 



senting right etc.) plus automatic reordering. In the PROG line we compute 
pre*{Rhr*) instead of posG({co}) H RhF* . 

In the right part of the figure we show results for different methods of comput- 
ing the repeating heads. In all cases we first computed the ‘rough’ components 
based on control flow information. We tried the transitive closure approach and 
the method of m, both with and without the preprocessing described earlier. 
The times are for the computation of the heads only. In these experiments, 
the preprocessing combined with a transitive closure computation worked best, 
followed by the method of m without preprocessing; interestingly, using mg 
combined with preprocessing led to worse results. 

In this example, the times achieved by the model checker would not be possi- 
ble without the symbolic representation of the variables. The translation into a 
normal pushdown system would create thousands or even millions of rules, and 
in a test we made just creating these took far longer than the model-checking 
with the symbolic approach. 

6 Comparison with Bebop 

In Pj Ball and Rajamani used the following example (see figure EJ to test 
their reachability checker Bebop. The example consists of one main function 
and n functions called levels, 1 < i < u, for some n > 0. There is one global 
variable g. Function main calls leveli twice. Every function levels checks g; if 
it is true, it increments a 3-bit counter to 7, otherwise it calls leveli_|_i twice. 
Before returning, levels negates g. The checker is asked to find out if the labelled 
statement in main is reachable, i.e. if g can end with a value of false. Since g is 
not initialised, the checker has to consider both possibilities. 

Despite the example’s simplicity, some its features are worth pointing out. 
There is no recursion in the program, and so its state space is finite. However, 
typical finite-state approaches would flatten the procedure call hierarchy, blowing 
up the program to an exponential size. Moreover, the program has exponentially 
many states, yet we can solve the reachability question in time linear in n. Finally, 
there are 0{n) different variables in the program; however, only two of them are 
in scope at any given time. For this reason, we can keep the stack alphabet very 
small, exploiting the locality inherent in the program’s structure. 

Running times for different values of n are listed in table El In PJ running 
time of four and a half minutes using the CUDD package and one and a half 
minutes with the CMU package is reported for n = 800, but unfortunately 
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bool g; 

void mainO { 
leveli 0 ; 
leveli 0 ; 
if (!g) { 
reach: skip; 

} 

} 



void leveli () { 
int (0 . . 7) i ; 
if (g) { 
i = 0; 

while (i < 7) i++; 
} else if (i < n) { 
level i+1 0 ; 
level i+i 0 ; 

} 

g = !g; 

} 



n 


time 


200 


0.50 s 


400 


0.94 s 


600 


1.46 s 


800 


1.99 s 


1000 


2.41s 


2000 


4.85 s 


5000 


13.63 s 



Fig. 5. Left: The Example Program. Right: Experimental Results. 



the paper does not say on which machine. More significant is the comparison of 
space consumption. We have a peak number of 155 live BDD nodes, independent 
of n. On the contrary, Bebop’s space consumption for BDDs increases linearly, 
reaching more than 200,000 live BDD nodes for n = 800. The reason of this 
difference is that our BDDs require 4 variables (one for the global variable g and 
three for the 3-bit counter in scope), while Bebop’s BDDs require 2401 variables 
(one variable for g and 2400 for the 800 3-bit counters) . Since P does not describe 
the model checking algorithm in detail, we cannot say if this difference in the 
number of BDD variables is inherent to the algorithms or due to a suboptimal 
implementation. 



7 Conclusions 

We have presented a model-checker to verify arbitrary LTL-properties of boolean 
programs with (possibly recursive) procedures. To the best of our knowledge this 
is the first checker able to deal with liveness properties. The Bebop model checker 
by Ball and Rajamani, the closest to ours, can also deal with recursive boolean 
programs, but it can only check safety properties 

Our checker works on a model called symbolic pushdown systems (SPDSs). 
While this model is definitely more abstract than Bebop’s input language, a 
translation of the former into the latter is simple (see Section 

Moreover, having SPDSs as input allows us to make use of the efficient 
automata-based algorithms described in which leads to some efficiency ad- 
vantages. In particular, the maximal number of variables in our BDDs depends 
only on the maximal number of local variables of the procedures, and not on the 
recursion depth of the program. 

Another interesting feature of the reachability algorithms of our checker is 
that they can be used to compute the set of reachable configurations of the 
program, i.e. we obtain a complete description of all the reachable pairs of the 
form (control point, stack content). This makes them applicable to some security 
problems of Java programs which require precisely this feature |0|. Even more 
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generally, we can compute the set of reachable configurations from any regular 
set of initial configurations. 



Acknowledgements 

Many thanks to Ahmed Bouajjani for helpful discussions on how to obtain sym- 
bolic versions of the algorithms of and to one anonymous referee for inter- 
esting comments and suggestions. 

References 

1. T. Ball and S. K. Rajamani. Bebop: A symbolic model checker for boolean pro- 
grams. In SPIN 00: SPIN Workshop, LNCS 1885, pages 113-130, 2000. 

2. T. Ball and S. K. Rajamani. Automatically validating temporal safety properties 
of interfaces. Technical report, 2001. 

3. A. Bouajjani, J. Esparza, and O. Maler. Reachability analysis of pushdown au- 
tomata: Application to model-checking. In Proceedings of CONCUR ’97, LNCS 
1243, pages 135-150, 1997. 

4. J. Esparza, D. Hansel, P. Rossmanith, and S. Schwoon. Efficient algorithms for 
model checking pushdown systems. In Proceedings of CAV ’00, LNCS 1855, 2000. 

5. J. Esparza and S. Schwoon. A BDD-based Model Checker for Recursive Programs. 
Technical report, Institut fiir Informatik, Technische Universitat Miinchen, 2001. 
Available at http : //www7 . in.tum.de/gruppen/theorie/publications/ , 

6. T. Jensen, D. L. Metayer, and T. Thorn. Verification of control flow based security 
properties. In Proceedings of 1999 IEEE Symposium on Security and Privacy, 
IEEE Press, 1999. 

7. J.R. Burch, E.M. Clarke, D.E. Long, K.L. MacMillan, and D.L. Dill. Symbolic 
model checking for sequential circuit verification. IEEE Transactions on Computer- 
Aided Design of Integrated Circuits and Systems, 13(4):401-424, 1994. 

8. F. Somenzi. Colorado University Decision Diagram Package. Technical report. 
University of Colorado, Boulder, 1998. 

9. R. E. Tarjan. Depth first search and linear graph algorithms. In SICOMP 1, pages 
146-160, 1972. 

10. A. Xie and P. A. Beerel. Implicit enumeration of strongly connected components. 
In Proceedings of ICCAD, pages 37-40, San Jose, CA, 1999. 



Model Checking the World Wide Web’^ 



Luca de Alfaro 

Department of Electrical Engineering and Computer Sciences 
University of California at Berkeley, Berkeley, CA 94720-1770, USA 
dealf aroSeecs . berkeley . edu 



Abstract. Web design is an inherently error-prone process. To help with the de- 
tection of errors in the structure and connectivity of Web pages, we propose to 
apply model-checking techniques to the analysis of the World Wide Weh. Model 
checking the Web is different in many respects from ordinary model checking of 
system models, since the Kripke structure of the Weh is not known in advance, hut 
can only he explored in a gradual fashion. In particular, the model-checking algo- 
rithms cannot be phrased in ordinary /r-calculus, since some operations, such as 
the computation of sets of predecessor Weh pages and the computations of great- 
est fixpoints, are not possible on the Web. We introduce constructive jj,-calculus, 
a fixpoint calculus similar to /r-calculus, but whose formulas can he effectively 
evaluated over the Weh; and we show that its expressive power is very close 
to that of ordinary ^-calculus. Constructive /i-calculus can be used not only for 
phrasing Web model-checking algorithms, but also for the analysis of systems 
having a large, irregular state space that can he only gradually explored, such 
as software systems. On the basis of these ideas, we have implemented the Web 
model checker MCWEB, and we describe some of the issues that arose in its im- 
plementation, as well as the type of errors that it was able to find. 



1 Introduction 



The design of a Web site is an inherently error-prone process. A Web site must be 
correctly designed both at a local and at a global level. Good design at the local level 
implies that the pages contain well-formed HTML code, have the intended visual ap- 
pearance, and have no broken links. Several tools are available for checking such local 
properties, either on single pages, or more commonly by crawling over an entire Web 
site: see for example 11/1201121141911 dlhll llhll 912 HS| . Good design at the global level 
requires that the Web site satishes properties concerning its connectivity and cost of 
traversal, as well as properties that depend on the path followed to reach the pages, 
rather than on the pages only. Examples of such global properties are that every page 
of a Web site is reachable from all other pages, and that all paths from the main page 
to pages with confidential information must go through an access control page. Current 
Web verihcation tools focus essentially on local properties. On the other hand, model 
checking has proved to be an effective technique for the specification and verification 
of global properties of the large graphs that correspond to the state-space and transition 
relation of systems 11411 VII . Hence, it is natural to ask whether model checking can be 
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applied to the analysis of global properties of Web sites. This paper answers this ques- 
tion affirmatively, by showing how model-checking techniques can be adapted to the 
analysis of the Web, and by illustrating which types of errors are amenable to discovery 
with such techniques. In particular, we show that model-checking techniques can be 
used for the analysis of the following three classes of properties: 

- Connectivity properties. Connectivity properties refer to the graph structure of a 
Web site. 

- Frame properties. Since each link loads only a portion of a frame-based page, the 
content of a frame-based page depends on the path followed by the browser in a 
site: hence, frame properties are essentially path properties. 

- Cost properties. Cost properties refer to the number of links or bytes, that must be 
followed or downloaded while browsing a Web site. An example consists in the 
computation of the all-pair longest path in a Web site. 

Model-checking methods can be broadly classihed in enumerative methods and 
symbolic methods. Enumerative methods operate on states as the basic entities Earn , 
and represent sets of states and transition relations in terms of the individual states. 
Symbolic methods operate directly on symbolic representations of sets of states EH . 

Our approach to the model checking of the Web is enumerative, in that we rep- 
resent sets of Web pages as collections of single pages. However, we argue that it is 
convenient to phrase our model-checking algorithms as symbolic algorithms, based on 
the manipulation of sets of Web pages. In fact, a set-based approach lends itself better 
to parallelization: given a set S of Web pages, the computation of the set Post{S) of 
Web pages that can be reached from S by following one link can proceed largely in 
parallel, by following simultaneously all links originating from pages in S. Since the 
page fetch time on the Web is typically dominated by response time, rather than trans- 
fer time, such a parallel approach is significantly more efficient than a sequential one. 
Nevertheless, the model checking of the Web differs in several respects from usual sym- 
bolic model checking. In particular, some of the basic operations performed by standard 
model-checking methods cannot be performed on the Web: 

1. Given a predicate P defining a property of Web pages, we cannot construct the set 
Sp consisting of all the Web pages that satisfy P. 

2. Given a set S of Web pages, we cannot construct the set Pre{S) of pages that can 
reach some page of S by following one linkQ 

3. The set V of all Web pages is not known in advance. Likewise, given a set [/ C V 
of Web pages, we cannot construct its complement V\U. 

These limitations imply in particular that we cannot phrase our model-checking algo- 
rithms in standard /r-calculus inurn . In fact, limitation OJimplies that we cannot eval- 
uate expressions that involve the greatest fixpoint operator p: in vx.<j){x), we cannot 
set xq = V in order to compute the limit limfe^oo Xk of Xk+i = <t>{xk), for fc > 0. 
Limitation [I] implies that we must introduce restrictions in the use of predicates, and 
LimitationQ prevents us from using the standard predecessor operator Pre. 

* Search engines such as Google do in fact provide such a service, but the answer they provide 
is only approximate. 
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We introduce constructive ^-calculus, a fixpoint calculus similar to equational p,- 
calculus m, but containing only expressions and constructs that can be effectively 
evaluated within the above limitations. Constructive /r-calculus differs from standard 
equational /i-calculus in the following respects: 

- The greatest hxpoint operator i/ is replaced by the operator i/x, where a; is a set of 
states that must be already known, and that acts as the “universe set” in which the 
largest hxpoint is computed. 

- The predecessor operator Pre is replaced by its guarded version GPre{U, W), de- 
hned by GPre{U, W) = U Pre{W). Since the pages in U are already known 
when GPre{U, W) is evaluated, all links from U to W are also known, ensuring 
that GPre{U, W) can be computed. 

- Predicates cannot be used to generate sets of states, but only to select from existing 
sets the states that satisfy given propositional formulas. 

We show that these restrictions are enough to ensure that the expressions of constructive 
p,-calculus are effectively computable, and we provide a precise characterization of the 
expressive power of constructive /i-calculus. In particular, we show that in spite of the 
above differences, the expressive power of constructive /i-calculus is essentially the 
same as the one of ordinary /i-calculus. We phrase our Web model-checking algorithms 
in constructive /i-calculus. 

We argue that the limitations inO are not peculiar to the Web, but are shared by a 
large class of systems that have a large or infinite state space without regular structure, 
among which software programs. In the analysis of complex programs, we often have 
no way of constructing in advance the set of all states, and we may not know what 
are the predecessors of a given set of states unless we have already encountered those 
states in the course of the model checking. Constructive /t-calculus is well-suited for 
phrasing model-checking algorithms that operate on-the-fly over irregular graphs that 
can be explored only gradually. 

In order to experiment with Web model checking, we have implemented the model 
checker MCWEB, which enables the analysis of connectivity, frame, and cost properties 
of Web sites. We report our experience in using MCWEB, and we summarize the most 
common classes of errors that we were able to find using it. 

2 The Graph Structure of the Weh 

As a first step in the application of model-checking techniques to the Web consists 
in fixing a graph structure of the Web. The simplest choice consists in disregarding the 
frame structure of the Web, and in modeling the Web as a graph of pages, with links due 
to both a (anchor) and frame (sub-frame) tags as edges. We call this thoflat model of 
the Web. The flat model suffices for many purposes, among which broken link detection 
and HTML consistency analysis, and indeed many current tools for Web analysis rely 
on the flat model. Nevertheless, some reachability properties cannot easily be checked 
on the flat model. For example, the property that the home page of a site is reachable 
from all pages of the site is often not true in the flat model of a frame-based site, since 
the link to the home page may be in a separate frame (and thus, a separate graph node) 
than the main content of the page. For this reason, our graph model of the Web takes 
into account the frame structure of the pages, unlike the flat model. 
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2.1 URLpages and Webnodes 

An URL a is a string that uniquely specifies a document on the Web; it is composed of a 
protocol held (such as HTTP), a domain name, and a document locator on the domain. 
In this paper, we restrict our attention to URLs that refer to the HTTP protocol. Given an 
URL a, we can fetch the corresponding document s = GetUrl{a)\ we call s the URL- 
page corresponding to the URL a. The URLpage s = {gs,hs,Fs,As) consists in the 
URL ps from which the document is retrieved, the textual content hg, a set of frame tags 
Fg, and a set of anchor tags Ag. The URL Qg may be different from s due to automatic 
redirection, as effected by the HTTP protocol. Since images are loaded automatically 
by most current browsers, we consider them to be an integral part of hg, even though 
they are specihed by separate anchors. A frame tag {b,£) consists of the URL b to be 
loaded into the subframe, and of a name £ used to label the subframe. An anchor tag 
{b, £) consists of the URL b specifying the link destination, and of a target name £, 
which specifies in which subframe the new URL should be loaded (if no target 

is specihed, we take £ to be the empty string e). While this is only a partial subset of 
the tags and attributes that occur in HTML documents, it will suffice for our purpose of 
dehning the graph structure of the Web. 

The nodes of our graph model of the Web, called the webgraph, consist in webnodes. 
A webnode w is a tree with URLpages as nodes; the edges of the tree are labeled 
by frame names. We write s G w to denote that an URLpage s is a node in the tree 
w. If s G w and Fg = {{ai,£i), . . . , (a„,£„)}, then the URLpage s has n URL- 
pages ti, . . . ,tn as children in w; for 1 < i < ri, the edge from s to f is labeled 
with £i. Given an URLpage t, the webnode w = WebNodef) is obtained by “load- 
ing” recursively all the subframes of t. Precisely, w consists in a tree with root t, 
where each URLpage s G w has as descendants the set {GetUrl{a) \ {a,£) G Ft} of 
URLpages corresponding to subframes of t. For brevity, given an URL a we dehne 
GetWeb{a) = WebNode{GetUrl{a)). 

The edges of the webgraph correspond to page links; the precise dehnition takes 
into account the way in which pages are loaded into subframes. Given a webnode 
w, and an URLpage s G w, we denote by subtree{w, s) the subtree of w with root 
in s. Given a webnode w, an URLpage s G w, and a link {a,£) G Ag, we denote 
by target{w, s, £) the subtree of w that will be replaced by the webnode GetWeb{a) 
when the link (a, £) is followed, dehned according to the HTML standard I16I18I . Pre- 
cisely, if ^ = Jolank or £ = -top, we have target{w, s,£) = w; if £ = _self or 
£ = e then target{w,s,£) = subtree{w, s), if £ = .parent then target{w, s,£) = 
subtree{w, f), where t is the parent of s in the tree w. Finally, if £ is any other string, 
then target{w, s, £) = subtree{w, t), where t is the unique URLpage such that the link 
in w from the parent of £ to £ is labeled £; if there is no such link, or if the link is not 
unique, then we treat the link as a “broken link”, and we take as its destination a spe- 
cial error webnode. Given a webnode w, a subtree u of w, and another webnode v, we 
denote by w[v/u] the result of replacing uhy v mw. Given a webnode w and an URL- 
page s G w, the destination of a non-broken link (a, £) G Ag consists in the webnode 
dest{w, s, {a, £)) = w[GetWeb{a)/target{w, s,£)]. 

Example 1 . In Figure^ we depict a webnode w = WebNode{sf), and two webnodes 
u, V reachable from w by following links. We have = { (ai , £i ) , (c 2 , £ 2 ) } ; the chil- 
dren of So in w are si, S 2 : by convention, we denote GetUrl(ai) = Si for all £ > 0. 
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u (a 4 ,£i) w (03,^2) V 




Fig. 1 . A primary webnode w = WebNode(so), and two secondary webnodes u, v. 



Only some of the edge labels and URLpages are indicated, to avoid clutter. URLpage 
Si contains the anchor tag (03,^2); taking this link leads to webnode v. URLpage S2 
contains the anchor tag (04, i'l); taking this link leads to webnode u. URLpage S5 con- 
tains the anchor tag (07, £q). Note that the link corresponding to the anchor tag (ay, £5) 
is broken in v, since there is no label £q in v, the link is not broken in w, and it is not 
present in u. This illustrates how links can become broken in secondary pages. 

2.2 The Webgraph 

In order to fix the structure of the webgraph, we need to establish a criterion for webn- 
ode equality. Two webnodes w and u are equal, written w = u, if their trees of URL- 
pages are equal: hence, webnode equality is defined in terms of URLpage equality. 
There are several possible definitions for URLpage equality; to understand the issue, 
we need to explain in more detail how URLpages are fetched from the Web. Given an 
URL oo, we can issue an HTTP request for oq. The result can either be the URLpage 
GetUrl(ao), or a redirection URL ai. In the latter case, we can issue a page request for 
ai, obtaining either GetUrl(ai), or a redirection URL 02. The process continues until 
either an error occurs, or until we reach a fc > 0 such that a request for Ofe returns an 
URLpage s; we set then GetUrl{ao) = • • • = GetUrl{ak) = s. Consider two sequences 
ao, . . . , ttfe, GetUrl(ao) and bo, GetUrl{bo) of redirections and final pages, and 
let s = GetUrl(ao) and t = GetUrl{bo)', note that we have gs = Ufe, and gt = bn. We 
can define whether s is equal to t, written s = t,in several ways. 

- Textual comparison. We can define s = t when hg = ht, i.e., when the texts 
of the two URLpages s and t are identical. According to this definition, however, 
different domains containing two textually identical pages (for instance, two empty 
pages) would share a webnode in the webgraph, leading to unexpected results when 
reachability analysis is performed. In addition, textual comparison is sensitive to 
minor differences in the pages retrieved, such as visitation counter updates. 

- Final URL comparison. Another possibility consists in defining s = t when Uk = 

bn, or equivalently when gg = gt. Occasionally, however, a request to an URL 
a is redirected to any of a large number of URLs ci, . . . , Cm, in order either to 
distribute the load between machines, or to provide slightly different content in 
terms of advertising. Final URL comparison would consider GetUrl{ci) ^ ^ 

GetUrl{cm), in spite of the fact that those pages are essentially the same page. 
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- Redirection sequence comparison. Finally, we can define s = t when {oq, • . . , Ofe} 
n{6o, . . . , bn} 7 ^ 0; this criterion is more robust than final URL comparison with 
respect to load-balancing and page-customization techniques. 

The Web checker MCWEB adopts redirection sequence comparison as the URLpage 
equality criterion, with additional heuristics used to cope with features such as auto- 
matic index . html extensions. 

Once a notion = of webnode equality has been fixed, we can define precisely the 
webgraph. Let V be the set of all webnodes, and let E = {(w, u) | w S U A 3s G w. 
3(a, G Ag.u — dest{w,s, (a,£))} be the set of all edges between webnodes. The 
webgraph {V / =,Ej =) is the quotient of (U, E) with respect to the equality notion 
=. We note that this definition is not completely precise, as it depends on the function 
GetUrl, that given an URL returns the corresponding URL. This definition also does 
not capture the fact that the true connectivity Web is time-varying. Nevertheless, this 
definition formalizes the structure of the Web to a sufficient degree for the development 
of our model-checking algorithms. 

We say that a webnode w G U is primary if there is an URL a such that w = 
GetWeb{a), and that w is secondary otherwise. Primary webnodes correspond to Web 
pages that can be obtained by loading an URL with a browser. Secondary webnodes 
cannot be loaded directly; they are reached by traversing links and updating the frame 
structure starting from primary webnodes. Most current tools for Web analysis only 
consider primary webnodes. Yet, many errors arise only in secondary webnodes, as 
illustrated by Example QJ Our experience with MCWEB indicates that the difficulty of 
examining all secondary webnodes is a common cause of errors on the Web. 



3 Model Checking the Weh 



As remarked in the introduction, the ordinary /^-calculus is not suited for the model 
checking of the Web, since it includes several operations that are not effectively com- 
putable on the Web. We introduce constructive /i-calculus, a fixpoint calculus similar to 
/^-calculus, but containing only expressions that can be effectively computed. 



3.1 Constructive /x-Calculus 

Syntax. Constructive /z-calculus (CpC) is derived from the equational /z-calculus of 
m- A CpC formula {{Bi, . . . , Bn), Xm) consists of n > 0 blocks B\, . . . , Bn, and of 
an output variable Xm, with m G {1, . . . , n}. Each block Bi, for 1 < i < n, has the 
form \i-Xi = Ci, where Xi is a variable, a set expression, and A; is a quantifier tag 
equal to either pxi, or to vxi C Xj for some j > i. Hence, the quantifier tag of the 
outermost block Bn must be pxn- Each set expression et is defined according to the 
following grammar: 

^ ::= X \ <1>U <1> \ \ <P\<I> \ <Pn0 \ a 

I Post{d>) I GPost{d>, d>) I GPre{d>, \ GPre{<P, d>) 
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where a is a constant, x is one of xi, . . . , x„, and 0 is a predicate expression. Predicate 
expressions are defined by the grammar 

0 0 V 0 I -0 I P, 

where P is a predicate belonging to some basic set of predicates V. The use of the 
set difference operator in set expressions is subject to the following restriction. For 
i,j G {1, . . . ,n}, we say that the block Bi depends directly on block Bj, written 

Bi >- Bj, if Xj appears in e^, and we let be the reflexive transitive closure of For 
i,j G {1, . . . , n}, we say that that the variable Xj occurs with negative polarity in 
if it occurs within an odd number of right-hand sides of the set-difference operator \. 
Then, we require that for alH, j G n}, the variable Xj occurs with negative 

polarity in only when Bi Bj. We say that a CpC formula is negation-free if it 
does not contain occurrences of the set difference operator \. We denote by CpC^ the 
negation-free fragment of CpC. 

Syntax of Ordinary Equational p-Calculus. In order to compare the expressive power 
of constructive and ordinary /i-calculus, we dehne also the semantics of the equational 
/r-calculus of Q, denoted by pC. The formulas of pC have the same structure of those 
of CpC, except that the quantifier tag Ai can be equal to either pxi, or to vxi. The 
syntax of set expressions is given by the grammar 

^::=x|^U^|^n^|0|^0|a| Post{<P) \ Post{<P) \ Pre{d>) \ Pre{<P) 

Semantics. For conciseness, we define the semantics of a calculus that is a superset 
of both CpC and pC~, the semantics of CpC and pC are obtained by considering 
the appropriate fragments of this calculus. The semantics is defined with respect to 
a Kripke structure /C = {V,E,C, ,V , f^), where (V,E) is a graph, C is a set of 

constants, : C V is the interpretation of the constants, P is a set of predicates, 
and : P I— > 2^ is the interpretation of the predicates. In the model checking of 
the Web, we take V, E as in the webgraph, C to be the set of valid URLs, /° to be 
GetWeb, P to be a set of effectively checkable predicates defined on webnodes, and 
fP(P) = {w G W \ w \= P} for all P G P. Given such a Kripke structure, all the 
operators in set and predicate expressions have their standard meanings, except for the 
predecessor and successor operators. The semantics of the predecessor and successor 
operators is defined, for all U,W CV, by 



Pre{W) = {u GV \ 3v G W.{u,v) G E} 
Pre{W) = {u GV \ Vu.((m, n) G E v G W)} 
Post(W) = {u GV \ 3v G W.{v,u) G E} 
Post{W) = {u GV \ Vv.((v,u) G E ^ V G W)} 



GPre{U, W) = Un Pre{W) 
GPre{U, W) = Un P^e{W) 

GPostfJ, W) = UO ]^t{W) 



The intuition is that in CpC we can compute the set of predecessors of a given set 
W of webnodes only relative to another set U of webnodes; similarly for the other 
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constructive operators. The operational semantics of constructive /r-calculus, given by 
Algorithm [Q below, will ensure that all the webnodes in U have already been explored 
when GPre{U, W) is computed (resp. GPre{U, W), GPost{U, W)), thus ensuring that 
all the links from U to W are known. 

The dehnition of semantics follows the lines of C3l- Consider a Kripke structure /C 
and a formula {{Bi , . . . , Bn),Xm) of C/iC, where each block Bi, 1 < i < n, has the 
form Xi-Xi = 6i, for Xi equal to either fj,Xi or vxi C Xj. Let B = ({xi, . . . , a;„} 2'^) 

be the set of valuations of the variables in the formula. Given 7 G L and 1 < i < n, we 
indicate with 7 o (xi = U) the valuation that coincides with 7 , except that it associates 
value U C V to Xi- Given a valuation 7 and a set expression e^, for 1 < i < n, we 
denote by |ei | /C, 7 ] C 1/ the value of Ci in the Kripke structure K. under valuation 7 . 
Given 7 G T we dehne recursively, for z = 0 to n, the valuation Eval]^;^ ( {Bi , . . . ,Bi) \ 
7 ) G r. The definition relies on two auxiliary functions g]c -y ■ ^ defined 

as follows: 

9k , = Eval]^^{{Bi , . . . , Bj_i) I S) o (xi+i = j(x^+i)) o • • • o (x„ = j(xn)) 

= 9k,j(^) ° = bi I 5 k, 7 ('^) 1 ) if>^i is pxi or vx^ 

fh,-yb) = 9 k, - yb) ° {xi = j{xj) n |e, I /C, ifXi is vxi C Xj 

We then define £va/^((i3i, . . . , By) \ 7 ) = X5.{8 = f]cj{S)), where A = /r if Ai is iiXi, 
and A = if Ai is vxy C Xj . The restrictions on the use of negation, together with the 
Tarski-Knaster theorem ca, ensure that the fixpoints exist. It can be readily verified 
that Eval^{{Bi , . . . , S„) | 7 ) does not depend on 7 . The meaning of the complete 
formula is the valuation of the output variable: we define . . . , Bn),Xm)}K = 

Eval^{{Bi , . . . , Bn) I 7)(xm), for an arbitrary 7 G L. 

3.2 Expressivity 

In order to study the relationship between the expressive power of CfiC and /iC, 
we consider fixed infinite and countable sets V and C of predicates and constants, 
so that the syntax of the formulas is fixed. Given a class U of Kripke structures, let 
^ = U I P) G be the set of all states. A formula (j) of fixpoint 

calculus defines a function |^] : 2^ by |(^](/C) = |^]k:. Given ZL and two fixpoint 

calculi C and C, we say that C is as expressive as C" oveiU, written C C , if for 
every formula (p' of C there is a formula p of C such that |(/)] and |(/)'] are the same 
function. We say that C and C are equally expressive overU, written C =u C , if both 
C C and C Qu C hold, and we say that C is strictly more expressive than C" 
over U, written C Zl C', if C C holds but C Qu C' does not. Let K.fi„ and ICcnt be 
the classes of Kripke structures with finite and countable state space, respectively. The 
following theorem relates the expressive power of pC and CpC~^. 

Theorem 1 . pC CpC^, and pC Hkc„, CpC^. 

The difference in expressive power is essentially due to the inability of CpC of con- 
sidering portions of the Kripke structure that are unreachable from named constants. 
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This is confirmed by the following result. We say that a class U of Kripke struc- 
tures is finitely rooted if there is a finite set of constants {ai, . . . , a„} such that for 
all (V, E,C,f‘^,V, f^) S U, we have that every state of V is reachable in (V, E) from 
/'^(oi) U • • • U f‘^{an) in a finite number of steps. 

Theorem 2 . For all classes U of finitely-rooted Kripke structures, we have 
pC =u CpC+. 



Proof. There is a straightforward translation of CpC^ into pC. The translation from 
pC to CpC~^ is as follows. Consider a pC expression {Bi, . . . , Bn, Xm), where for 
\ <i < n the block Bi has the form Xxi-Xi = Ci, for A C {p, u}. An equivalent CpC 
expression is {B[, . . . , B'^, B'^_,^-^,Xm), where the block is py.y = GetWeb{ai){J 
• • • U GetWeb(an) U Post{y), and for 1 < i < n, the block is obtained from Bi by 
replacing fXi with i/Xi C y and Pre{d>) with GPre{y, <P), Pre{<P) with GPre{y, <P), and 
Post(fi>) with GPost{y,<F). I 

The converse is also true, under some general conditions. We say that a Kripke structure 
is non-trivial if it contains at least one predicate symbol. 

Theorem 3 . Consider a class Li of non-trivial Kripke structures. If pC =u CpC'^, 
then all the structures in lA are finitely rooted. 

The following result follows from the presence of the set-difference operator \ in the 
definition of CpC, and can be proved similarly to Theorem^ 

Theorem 4 . On finitely-rooted Kripke structures, the fixpoint calculus CpC is closed 
under complementation. 

The difference in expressive power between CpC and pC on finitely-rooted structures 
is due to the fact that pC is not closed under complementation. Let pC^ be the calculus 
obtaining by adding to pC the operator DGetWeb, applicable only to constants, with 
semantics defined by DGerVVfef>(a) = {w G V \ w f /'^(a)}. The calculus pC^ is then 
closed under complementation, leading to the following theorem. 

Theorem 5 . The following assertions hold. 

1. pC^ CpC, and pC^ □k™, CpC. 

2. For all classes lA offinitely-rooted Kripke structures, we have pC^ =u CpC. 

3.3 Evaluation of Constructive /x-Calculus Formulas 

While pC and CpC have similar expressive power, the calculus CpC guarantees that, 
whenever the interpretations of the variables at the fixpoint consists in finite sets, then 
the fixpoint itself can be computed in finite time. An algorithm for doing so is given 
below. 
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Algorithm 1 . Input: a Kripke structure K, = {V, E, C, V, /^), and a C/rC for- 
mula (f) = {{Bi, . . . , Bn),Xm), where each block Bi, 1 < i < n, has the form 

^i-Xi — Cj. 

Output: |(/)]k;. 

Procedure: Let B = {x \, . . . , a;„} 2^ . Given a valuation 7 G L, we define recur- 

sively, for f = 0 to n, the valuation Compute{{Bi , . . . , Bi) | 7 ) G -T. For f = 0, we let 
Compute{{) I 7 ) = 7 - For f > 0, the definition is as follows. 

Init: If Xi is p,Xi, then let 7 g = 7 o (a;^ = 0); if \t is vxi C xj, then let 

lo = l°{x, = "f{Xj)). 

Update: For fc > 0, let 7 ^' = Compute{{Bi, . . . , Bi_i) | 7 ^), 

Wk = le^ I 1C, 7 ^'], and ^ o {xi = Wu). 

Define: Compute{{Bi, . . . , Bi) \ 7 ) = limfc^oo I'k- 

Return: Compute{{Bi, . . . , Bn) \ ^){xm), where 7 is arbitrary. 

The following theorem states that the fixpoints of Cp,C, if finite, can be effectively com- 
puted. We say that an operation can be ejfectively computed if it involves only finitely 
many states of the Kripke structure. 

Theorem 6 . Consider a Cp,C formula {{Bi , . . . , Bn), Xm), and assume that for a 
variable interpretation 7 , we have \EvaPf{{Bi,...,Bn) \ 7 )(a;i)| < 00 for all 
1 < i < n. Then, AlgorithmUiconsists of effectively computable steps, and it terminates 
returning {{{Bi, . . .,Bn),Xm)jK- 

The result is a consequence of the fact that, if all variables have finite extension at the 
fixpoint, then only a finite portion of the Kripke structure is explored. Note that the 
result is independent from the cardinality of V. In contrast, it is well known that when 
V is infinite, the formulas of p,C cannot be evaluated iteratively, even when the fixpoints 
are finite. 

3.4 Predicates for Web Analysis 

After some experimentation, we have chosen to include in the Web model checker 
MCWEB the following families of predicates, for all strings a, domains A, and fc > 0: 

- predicate containSo,j_c( 2 ,...,Q;fc holds for a webnode w if there is an URLpage 
s G w such that hg contains all the strings «i, . . . , a^. 

- predicate in_domain/i holds for a webnode w if there is an URLpage s G w such 
that ps contains the substring Z\; 

- predicate all_in_doTnain/i holds for a webnode w if all URLpages s G w are 
such that ps contains the substring Zi; 

- predicate http_error^. holds for a webnode w if the HTTP error fc occurred 
while loading some URLpage in w; 

- predicate f rames_error is a catch-all predicate, that holds for a webnode w if 
the frame structure at w contains errors. Among the errors currently checked are: 
duplicated frame names (a name i that occurs in more than one frame tag), frame 
trees deeper than a fixed threshold, and non-existent link targets (anchors tags (a, i) 
such that £ does not appear in any frame tag). 
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3.5 A Semi-decision Procedure for Non-emptiness 

Consider a formula (f) = ((-Bi, . . . , Bn), Xm), where each block Bi has the 

form Xi-Xi = Ci, for 1 < i < n. During the evaluation of (p according to AlgorithmQl 
we call checkpoints the stages where all the variables Xi with quantifier tag equal to 
vxi C Xj for some j > i have reached a fixpoint (even if some variables with p-tag 
have not). Then, if the interpretation of the output variable Xm at some checkpoint is 
non-empty, we know that also \(pjic 7^ 0- To make this observation precise, we consider 
a CpC^ formula p = {{Bi, . . . , Bn), Xm), and we let {i G {1 , . . . , n} | Ai is p,Xi} = 
{i\, . . . , ij} be the set of indices of the p-blocks in (f>-, we denote by 'ip{(j)) = j the 
number of such indices. Given fci , A:2 fcj > 0, we compute |(/)]^^’ '’^^ by following 
AlgorithmQl except that for 1 < ? < 7 we take Compute{{Bi, . . . , Bi,) \ = 

iki- Hence, is computed by performing k\,k2, ■ ■ ■ ,kj iterations of the p.- 

blocks, rather than waiting until the fixpoint is reached. 

Theorem 7 . For all C pC~^ formulas (p and Kripke structures IC, 7 ^ 0 

for some ki,..., > 0, then |i^]/c 7^ 0. 

Given a CpC~^ formula <p and a (possibly inhnite) hnitely-branching Kripke structure 
K, this theorem provides a semi-decision procedure for 7^ 0; it suffices to enu- 
merate the lists of non-negative integers {ki, . . . , checking for each list whether 

7^ 0. As an example, consider the formula 
4> = {{t^y Q x.y = Pre{y), px.x = in_domain/i n {Post{x) U a)),y), 

where Z\ is a domain name, and a is an URL in that domain. If K. is the webgraph, 
then \(p'\ic is the set of webnodes in domain A that are reachable from the URL a, and 
that have no link sequence leading outside A. The variable x keeps track of the portion 
of A that is reachable from a. If the domain A contains inhnitely many webnodes (as 
can be the case in sites with dynamically generated pages), then the evaluation of 
does not terminate. On the other hand, we can obtain a semi-decision procedure for 
7^ 0 by evaluating |</>]^ for k = 0,1,2,..., and by checking for non-emptiness 
for each value of k. This provides a semi-decision procedure for detecting pages in a 
Web site that cannot reach the rest of the Web. 



4 Web Model Checking in Practice 

In order to experiment with Web model checking, we have implemented the model 
checker MCWEB. The checker MCWEB is written in Python; its input consists in con- 
structive p-calculus formulas, augmented with the capability of post-processing the out- 
put in order to perform quantitative analysis of Web properties. 

In some domains, such as hardware, the cost of errors that go undetected until the 
production stage is very large, and consequently a large effort is done in order to detect 
them early. Formal verihcation methods such as model checking are usually called to 
help in hnding error that cannot be found with other methods. Consequently, finding 
errors with formal methods is a difficult task. In Web model checking, the situation is 
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very different. Due to the lower cost of undetected errors on the Web, many collections 
of Web pages are checked cursorily, if at all, and in our experience errors are abundant, 
and come in great variety. The sites where we found the highest density of errors were 
medium-sized sites: small sites often have a simple structure that limits the number of 
errors; large commercial sites are usually produced with the help of automated tools, 
that help in avoiding structural errors. Nevertheless, errors were found even in large 
sites such as amazon . com. 

In the course of the experimentation with MCWEB, we have identified some cate- 
gories of errors and properties that are commonly of interest. 

- Broken links. Detecting broken links is an ability that MCWEB shares with many 
other tools. MCWEB implicitly checks for broken links whenever the Post operator 
is applied to a set of webnodes. 

- Duplicated frame names, mcweb checks automatically for ill-formed frame struc- 
tures using the predicate f rames_error described earlier. For example, to check 
that no webnode with ill-formed frame structure is present in the site A with home 
page a, it suffices to evaluate the formula {{fxx.x = aU (Post{x) n in_domain/i), 
^y.y = a; n f rames_error), y). 

- Non-hierarchical frame content. If an URLpage t of webnode w is not in the same 
domain as the root URLpage s of w, then the content and the links in t are typically 
not under the control of the author of s. Moreover, if s can be reached from t, then 
this usually leads to a webnode containing two instances of the same URLpage s. 
We can check that all webnodes in domain A are composed only of URLpages in 
A by evaluating the formula {{fix-x = a U {Post{x) n in_domain/i), /i?/.y = 
X n ^all_in_domain/i), y), where a is the home page of A. 

- Reachability. Suppose that A is a set of webnodes containing publicly available 
information, B is a set of webnodes with private content, and C is a set of access 
control webnodes. We can check that all paths from ^ to i? in domain A contain 
a webnode in C by checking the emptiness of the formula {{ny.y = (x n A) U 
(Post{y) n in_domain/i n ~^C), yx.x = a U {Post{x) n in_domairi/i), yz.z = 
y n B),z), where we assume that A, B, C are definable in terms of predicate 
contains, and a is the home page of A. 

- Repeated reachability. To compute which pages of a Web site A cannot reach 
the home page a without leaving A, we can evaluate the formula {{yz.z = a U 
Pre{x, z), iix.x = a U {Post{x) n in_domain/i), yy.y = x \ z),y). 

- Longest paths, mcweb also contains an extension that enables the computation of 
the longest and shortest paths in a set of webnodes, where the “length” of a path 
consists in the number of bytes, or the number of links, that must be downloaded in 
order to follow it. For example, to find the all-pair longest path between webnodes 
of a domain A, MCWEB post-processes the output of the formula {{yx.x = a U 
{Post{x) n in_domain/i)), a;). The computation of the all-pair longest path can 
provide information about the bottlenecks in the navigation of a site. 



Acknowledgments 

I would like to thank Tom and Monika Henzinger, Jan Jannink, and Freddy Mang for 
many helpful discussions and suggestions. 




Model Checking the World Wide Web 



349 



References 

1. G. Bhat and R. Cleaveland. Efficient model checking via the equational /i-calculus. In Proc. 
11th IEEE Symp. Logic in Comp. Sci., pages 304-312, 1996. 

2. J.R. Burch, E.M. Clarke, K.L. McMillan, D.L. Dill, and L.J. Hwang. Symbolic model check- 
ing: 10^° states and beyond. In Proc. 5th IEEE Symp. Logic in Comp. Sci., pages 428^39. 
IEEE Computer Society Press, 1990. 

3. J.R. Burch, K.L. McMillan, E.M. Clarkes, and D.L. Dill. Sequential circuit verification using 
symbolic model checking. In Proc. of the 27th ACM/IEEE Design Automation Conference, 
pages 46-51, Orlando, FL, USA, June 1990. 

4. E.M. Clarke and E.A. Emerson. Design and synthesis of synchronization skeletons using 
branching time temporal logic. In Proc. Workshop on Logic of Programs, volume 131 of 
Lect. Notes in Comp. Sci., pages 52-71. Springer- Verlag, 1981. 

5. Electronic Software Publishing Co. Linkscan. shttp://www.elsop.com/linkscan/. 

6. Watchfire Co. Linkbot. shttp://www. watchfire.com/products/linkbot.htm. 

7. R.T. Fielding. Maintaiing distributed hypertext infostructures: Welcome to MOMspider’s 
web. In Proceedings of Eirst Inti Conference on the World-Wide Web ('UWiy 94), 1994. 

8. Voget Selbach Enterprises GmbH. Link tester. 
shttp://vse-online.com/link-tester/. 

9. Tilman Hausherr. Link sleuth. shttp://home.snafu.de/tilman/xenulink.html. 

10. T.A. Henzinger, O. Kupferman, and S. Qadeer. From prehistoric to poitmodern symbolic 
model checking. In A.J. Hu and M.Y. Vardi, editors, CAV 98: Computer-aided Verification, 
Lecture Notes in Computer Science 1427, pages 195-206. Springer- Verlag, 1998. 

11. Biggbyte Software Inc. Infolink. 
shttp://www.biggbyte.com/infolink/index.html. 

12. Link Alarm Inc. Link alarm. shttp://www.linkalarm.com/. 

13. NetMechanic Inc. Html toolbox. shttp://www.netmechanic.com/. 

14. InContext. Web analyzer 2.0. shttp://www.incontext.com/WAinfo.html. 

15. D. Kozen. Results on the propositional p-calculus. Theoretical Computer Science, 
27(3):333-354, 1983. 

16. C. Musciano and B. Kennedy. HTML: The Definitive Guide. O’Reilly & Associates, Inc., 
1998. Third Edition. 

17. J.R Queille and J. Sifakis. Specification and verification of concurrent systems in Cesar. In 
Proc. 5th International Symposium on Programming, volume 137 of Lect. Notes in Comp. 
Sd., pages 337-351. Springer- Verlag, 1981. 

18. D. Raggett, A. Le Hors, and I. Jacobs. HTML 4.01 specification, 1999. W3C Recommen- 
dation 24 December 1999. 

19. Internet Software Services. Theseus. shttp://www.matterform.com/theseus/. 

20. IXActa Visual Software. Ixsite web analyzer. 
shttp://ixacta.com/products/ixsite/. 

21. DACPro Computer Solutions. Webtester. 
shttp://awsd.com/scripts/webtester/. 

22. A. Tarski. A lattice-theoretical fixpoint theorem and its applications. Pacific Journal of 
Mathematics, 25(2):285-309, 1955. 




Distributed Symbolic Model Checking for /x-Calculus 



Orna Gmmberg^, Tamir Heyman^’^, and Assaf Schuster^ 

^ Computer Science Department, Technion, Haifa, Israel 
^ IBM Haifa Research Laboratories, Haifa, Israel 



Abstract. In this paper we propose a distributed symbolic algorithm for model 
checking of propositional ^-calculus formulas, ^-calculus is a powerful formal- 
ism and many problems like (fair) CTL and LTL model checking can be solved 
using the ^-calculus model checking. Previous works on distributed symbolic 
model checking were restricted to reachability analysis and safety properties. 
This work thus significantly extends the scope of properties that can be verified 
for very large designs. 

The algorithm distributively evaluates subformulas. It results in sets of states 
which are evenly distributed among the processes. We show that this algorithm 
is scalable, and thus can be implemented on huge distributed clusters of comput- 
ing nodes. In this way, the memory modules of the computing nodes collaborate 
to create a very large store, thus enables the checking of much larger designs. 
We formally prove the correctness of the parallel algorithm. We complement the 
distribution of the state sets by showing how to distribute the transition relation. 



1 Introduction 



which 



In the early 1980’s, model checking procedures have been suggested 
could handle systems with few thousands states. In the early 1990’s, symbolic model 
checking methods have been introduced. These methods, based on Binary Decision Di- 
agrams (BDDs) Q, could verify systems with 10^° states and more Q. This progress 
has made model checking applicable to industrial designs of medium size. Significant 
efforts have been made since to fight the state explosion problem. But the need in veri- 
fying larger systems grows faster than the capacity of any newly developed method. 

Recently, a new promising method for increasing the memory capacity was intro- 
duced. The method uses the collective pool of memory modules in a network of pro- 
cesses. In , distributed symbolic reachability analysis has been performed, for find- 
ing the set of all states reachable from the initial states. In a distributed symbolic 
on-the-fly algorithm has been applied in order to model check properties written as 
regular expression. Experimental results show that distributed methods can achieve an 
average memory scale-up of 300 on 500 processes. Consequently, they find errors that 
were not found by sequential tools. 

This paper extends the scope of properties that can be verified for large designs, by 
presenting a distributed symbolic model checking for the ^-calculus. The p-calculus is 
a powerful formalism for expressing properties of transition systems using fixpoint op- 
erators. Many verification procedures can be solved by translating them into /r-calculus 
model checkinglj]. Such problems include (fair) CTL model checking, LTL model 
checking, bisimulation equivalence and language containment of w-regular automata. 
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Many algorithms for /i-calculus model checking have been suggested | 

In this work we parallelize a simple sequential algorithm, as presented in Q. The algo- 
rithm works bottom-up through the formula, evaluating each subformula based on the 
evaluation of its own subformulas. A formula is interpreted as the set of states in which 
it is true. Thus, for each ^-calculus operation, the algorithm receives a set (or sets) of 
states and returns a new set of states. 

The distributed algorithm follows the same lines as the sequential one, except that 
each process runs its own copy of the algorithm and each set of states is stored distribu- 
tively among the processes. Every process owns a slice of the set, so that the disjunction 
of all slices contains the whole set. An operation is now performed on a set (or sets) of 
slices and returns a set of slices. At no point in the distributed algorithm a whole set is 
stored by a single process. 

Distributed computation might be subtle for some operations. For instance, in order 
to evaluate a formula of the form ~^g, the set of states satisfying g should be comple- 
mented. It is impossible to carry this operation locally by each process. Rather, each 
process sends the other processes the states they own, which are not in g to the best of 
its knowledge. If none of the processes “knows” that a state is in g, then it is (distribu- 
tively) decided to be in ~^g. 

While performing an operation, a process may obtain states that are not owned 
by it. For instance, when evaluating the formula EX/, a process will find the set of all 
predecessor of states in its slice for /. However, some of these predecessors may belong 
to the slice of another process. Therefore, the procedure exch is executed (in parallel) 
by all processes, and each process sends its non-owned states to their respective owner. 

Keeping the memory requirements low is done through frequent calls to a memory 
balancing procedure. It ensures that each set is partitioned evenly among the processes. 
This ensures that the memory requirements, commonly proportional to the size of the 
manipulated set, are evenly distributed among the processes. However, this also re- 
quires different slicing functions for different sets. As a result, we may need to apply 
an operation to two sets that are sliced according to different partitions. In the case of 
conjunction, for instance, first the two sets should be re-sliced according to the same 
partition. Only then the processes apply conjunction to their individual slices. 

Distributing the sets of states is only one facet of the problem. The transition rela- 
tion also strongly influences the memory peaks that appear during the computation of 
pre-image (EX) operations. The pre-image operation has one of the highest memory 
requirements in model checking. Even when its hnal result is of tractable size, its inter- 
mediate results might explode the memory. We propose a scalable distributed method 
for the pre-image computation, including partitioning of the transition relation. 



2 Preliminaries 

2.1 The Propositional /x-Calculus 

Below we define the propositional /i-calculus ^J. We will not distinguish between a 
set of states and the boolean function that characterizes this set. By abuse of notation 
we will apply both set operations and boolean operations on sets and boolean functions. 
Let AP be a set of atomic propositions and let VAR — {Q, Qi, Q 2 , . . .} be a set of 
relational variables. The /i-calculus formulas are defined as follows: if /3 S AP, then p 
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is a formula; a relational variable Q AR is a formula; if / and g are formulas, then 
->/, f A g, f y g, EX / are formulas; if Q G VAR and / is a formula, then gQ.f and 
vQ.f are formulas, ^-calculus consists of the set of closed formulas, in which every 
relational variable Q is within the scope of gQ or vQ. 

Formulas of the ^-calculus are interpreted with respect to a transition system M = 
(iSf, R, L) where St is a nonempty and finite set of states; i? C St x St is the transition 
relation, and L : St ^ 2 ^^ is the labelling function that maps each state to the set of 
atomic propositions true in that state. 

In order to define the semantics of ^-calculus formulas, we use an environment 
e : V AR — > 2 ‘®*, which associates with each relational variable a set of states from M. 

Given a transition system M and an environment e, the semantics of a formula /, 
denoted [[/]]mc, is the set of states in which / is true. We denote by e[Q ^ W] a new 
environment that is the same as e except that e[Q ^ W](Q) = W. The set [[/]]mc is 
defined recursively as follows (where M is omitted when clear from the context). 

• [b]]e = {s I p e L{s)} • [[51 A 52]]e = [[5i]]e n [[g2]]e 

• [[Q]]e = e(Q) • [[51 V 52]]e = [[5i]]e U [[g2]]e 

• [h 5 ]]e = St\ [[g]]e • [[EXpJJe = {s | [(s, t) eRandte [[g]]e] } 

• [[pQ.p]]e and [[i^Q.p]]e are the least and greatest fixpoints, respectively, of the 

predicate transformer r : 2 '®* — > 2 '®* defined by: t{W) = [[p]]e[Q ^ W] 

Tarski ^3 showed that least and greatest fixpoints always exist if r is monotone. If 
T is also continuous, then the least and greatest fixpoints of r can be computed by 
UiV{False) and HiViTrue), respectively. In Q it is shown that if M is finite then 
any monotone r is also continuous. 

In this paper we consider only monotone formulas. Since we consider only finite 
transition systems, they are also continuous. The function f ixpt on the right-hand- 
side of Figure^ describes an algorithm for computing the least or greatest fixpoint, 
depending on the initialization of Qvai- If the parameter / is False then the least fix- 
point is computed. Otherwise, if / = True, then the greatest fixpoint is computed. 

Given a transition system M, an environment e, and a formula / of the p-calculus, 
the model checking algorithm for p-calculus finds the set of states in M that satisfy /. 
Figurejpresents a sequential recursive algorithm for evaluating p-calculus formulas. 
For closed /i-calculus formulas, the initial environment is irrelevant. The necessary 
environments are constructed during recursive applications of the ev function. 

2.2 Elements of Distributed Symbolic Model Checking 

Our distributed algorithm involves several basic elements that were developed in | ■■ | . 
For completeness, we briefly mention these elements in this subsection. 

intermediate results, are represented by BDDs. the algorithm execution, the sets of 
states obtained are partitioned among the processes. A set of window functions is used 
to define the partitioning, determining the slice that is stored (we say: owned) by each 
process. 

Definition 1. [Complete Set of Window Functions] A window function is a boolean 
function that characterizes a subset of the state space. A set of window functions 
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1 function ev(/, e) 

2 case 

3 f~P- res = {s I p G I/(s)} 

4 f^Q: res = e{Q) 

5 f = ^9- res = -lev (p, e) 

6 / = <;iV (;2 : res =ev ((?i , e)Vev (p 2 , e) 

7 / = pi Ap 2 : res =ev ((?!, e)Aev (p 2 , e) 

8 f —EXg: res = {s | 3f[si?f A f Gev (p, e) ] } 

9 f = pQ.g: res = f ix.pt {Q, g,e, False) 

10 f — vQ.g: res =fixpt (Q, p, e, True) 

11 endcase 

12 return (res) 

13 end function 



1 function f ixpt (Q, p, e, 7) 

2 Qval ~ 7 

3 repeat 

4 Qold ~ Qval 

5 Qval — GV (p, e[Q ^ Qo/d] ) 
8 until [Qval — Qold) 

7 return Qval 

8 end function 



Fig. 1. Pseudo-code for Sequential /i-Calculus Model Checking. 



Wi , , Wk is complete if and only if for every 1 < i, j < k, i j, Wi A Wj = 0 and 

Vti vPi = 1. 

Unless otherwise stated, we assume that all sets of window functions are complete. 

We use the slicing algorithm, as described in | u | to get a set of window functions. 
The objective of this algorithm is to distribute a given set evenly among the nodes. Its 
input is a set of states, and its output is a set of window functions. These functions slices 
the input set into subsets that are approximately of the same size. 

Maintaining balanced memory requirem ent b y the processes is done by means of a 
memory balance algorithm, as described in | o | . When this algorithm is applied at an 
already sliced set of states, a new partitioning is computed, one that will balance the 
size of the subsets. The new partitioning is computed by pairing large slice of the set 
with small one and re-slicing their union. This algorithm dehnes a new set of window 
functions that will be used to produce further intermediate results. 

During the memory balance algorithm, as well as during other parts of the dis- 
tributed model checking algorithm, BDDs are shipped between the processes. The com- 
munication uses a compact and universal BDD representation, as described in 
Different variable order is allowed in the different processes. 



3 Distributed Model Checking for ^t-Calculus. 

The general idea of the distributed algorithm is as follows. The algorithm consists of 
two phases. The initial phase starts as the sequential algorithm, described in Section^J 
It terminates when the memory requirement reaches a given threshold. At this point, the 
distributed phase begins. In order to distribute the work among the processes, the state 
space is partitioned into several parts, using a slicing procedure. Throughout the dis- 
tributed phase, each process owns one part of the state space for every set of states 
associated with a certain subformula. When computation of a subformula produces 
states owned by other processes, these states are sent out to the respective processes. 
A memory balancing mechanism is used to repartition imbalance sets of states which 
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are produced during the computation. Distributed termination algorithm is used to an- 
nounce global termination. In the rest of this section, we describe elements used by this 
algorithm. 



3.1 Switching from Initial to Distributed Computation 

When the initial phase terminates, several subformulas have already been evaluated and 
the sets of states associated with them are stored. In order to start the distributed phase, 
we slice the sets of states found so far and distribute the slices among the processes. 

Each set of states is represented by a BDD and its size is measured by the number of 
BDD nodes. All sets are managed by the same BDD manager, where parts of the BDDs 
that are used by several sets are shared and stored only once. Thus, when partitioning 
the sets, there are two factors involved: the required storage space for the sets, and the 
space needed to manipulate them. In order to keep the first factor small, it is best to 
partition the sets so that the space used by the BDD manager for all sets in each process 
is small. To keep the second factor small, observe that the memory used in performing 
an operation is proportional to the size of the set it is applied to, thus the part of each 
set in each process should be small. 

In model checking, the most acute peaks in memory requirement usually occur 
while operations are performed. Thus, it is more important to reduce the second fac- 
tor. Indeed, rather than minimizing the total size of each process, our algorithm slices 
each set in a way that reduces the size of its parts. It is important to note that as a result 
the slicing criterion may differ for different sets. 

We use a slicing algorithmm described generally in Section^J In order to slice 
all the sets that where already evaluated at the point of phase switching, slicing is ap- 
plied to each one of them. 

While the slicing algorithm works it updates two tables: InitEval and InitSet. 
InitEval keeps track of which sets have been evaluated by the initial phase of the 
algorithm. InitEval{f) is True if and only if / has been evaluated by the initial algo- 
rithm. Each process id has the table InitSet that for each formula /, holds the subset of 
the set of states satisfying / and owned by this process. Eormally, for each process id, 
InitSet{f) = / A Wid- The distributed phase will start by sending the tables InitEval 
and InitSet and the list of slices Wi to all the processes. 

3.2 The Distributed Phase 

The distributed version of the model checking algorithm for the ^-calculus is given in 
EigureH While the sequential algorithm finds the set of states in a given model that 
satisfy a formula of the /r-calculus logic, in the distributed algorithm each process finds 
the part of this set that the process owns. Intuitively, the distributed algorithm works 
as follows: given a set of slices Wi, a formula /, and an environment e, the process id 
finds the set of states ev(/, e) A Wid- 

In fact, a weaker property is required in order to guarantee the correctness of the al- 
gorithm. We only need to know that when evaluating a formula /, every state satisfying 
/ is collected by at least one of the processes. Eor efficiency, however, we require in 
addition that every state is collected by exactly one process. 
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Given a formula / the algorithm first checks if the initial phase has already evaluated 
it by checking if InitEval{f) = True. If so, it uses the result stored in InitSet{f). 
Otherwise, it evaluates the formula recursively. Each recursive application associates a 
set of states with some subformula. 

Preserving the work load is an inherent problem in distributed computation. If the 
memory requirement in one of the processes is significantly larger than the others, then 
the effectiveness of the distributed system is destroyed. To avoid this situation, when- 
ever a new set of states is created a memory balance procedure is invoked to keep a 
balanced memory requirement by the new set. The memory balance procedure changes 
the slices Wi and updates the parts of the new set in each of the processes accordingly. 
Each process in the distributed algorithm evaluates each subformula / as follow (see 
Figure^: 

A propositional formula p G AT: evaluated by collecting all the states s that satisfy two 
conditions: p is in the labelling L(s) of s and in addition s is owned by this process. 

A relational variable Q: evaluated using the local environment of the process. Since 
only closed /i-calculus formulas are evaluated, the environment must have a value for 
Q (computed in a previous step). 

A subformula of the form ^g: evaluated by first evaluating g, and then using the special 
function exchnot. Given a set of states S and a partition Si, . . . , Sk of S, each process 
i runs the procedure exchnot on Si. The process reports all other processes of the 
states that do not belong to S “as far as it knows”. Since each state in S belongs to 
some process, if none of the processes knows that s is in S, then s is in ~^S. 

Since each process holds only the states of that it owns, the processes actually 
send each other only states that owned by the receiver. This reduces communication. 

A subformula of the form giV g2' evaluated by first evaluating gi and g2, possibly with 
different slicing functions. This means that a process can hold a part of gi with respect 
to one slicing and a part of 52 with respect to another slicing. Nevertheless, since each 
state of gi and of 52 belongs to one of the processes, each state of gi V 52 now belongs 
to one of the processes. Applying the function exch results in a correct distribution of 
the states among the processes, according to the current slicing. 

A subformula of the form gi A 52 can be translated using De Morgan’s laws to 
^ V ^52 ) ■ However, evaluating the translated formula requires four communication 
phases (via exch and exchnot). Instead, such a formula is evaluated by first evaluat- 
ing gi and 52- As in the previous case, they might be evaluated with respect to different 
window functions. Here, however, the slicing of the two formulas should agree before a 
conjunction can be applied. This is achieved by applying exch twice, thus the overall 
communication is reduced to only two rounds. 

A subformula of the form EXg: evaluated by first evaluating g and then computing the 
pre-image using the transition relation R. Since every state of g belongs to one of the 
processes, every state of the pre-image also belongs to one. In fact, a state may be com- 
puted by more than one process if it is obtained as a pre-image of two parts. Applying 
exch completes the evaluation correctly. 

Subformulas of the form /iQ.g and i^Q.g (the least fixpoint and greatest fixpoint, re- 
spectively): evaluated using a special function f ixpt that iterates until a fixpoint is 
found. The computations for the formulas differ only in the initialization which is Fal se 
for pQ.g and the current window functions for vQ.g. 
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1 function pev(/, e) 



2 

3 

4 

5 

6 

7 

8 
9 

10 
11 
12 

13 

14 

15 

16 end function 



case 

InitEval(f) : 
f =P- 

f = Q- 

f = ^9- 
/ = ffi V 52 : 
/ = ffi A 52 : 

/ =EX5 : 

/ = pQ-g ■■ 

f = uQ.g : 
endcase 
IdBlnc (res) 
return (res) 



return [InitSet{f) ) 
res = {s \ p € I/(s)} A Wu 
return (e{Q)) 
res =exchnot (pev ( 5 , e) ) 
res =exch(pev( 5 i,e)Vpev( 52 ,e) ) 
resi =pev ( 51 , e) res 2 —pev (g 2 ,e) 
res =exch (resi ) Aexch {res 2 ) 
res =exch ({s | 3t[sRt A t Gpev ( 5 , e) ] }) 
res =f ixpt {Q, 5 , e, False) 
res =fixpt {Q, 5 , e, Wid) 

/* balances W; updates res accordingly */ 



1 function f ixpt (Q, 5 , e, mif ) 

2 Qval — iriit 

3 repeat 

4 Qold — Qval 

5 = pev( 5 , e[Q ^ Qoid]) 

6 until (parterm(exch(Q„oi) ' 

7 return Qval 

8 end function 

1 function exch(S') 

2 res = S A VFici 

3 for each process i ^ id 

4 sendto (i, S A Wi) 

5 for each process i ^ id 

6 res = res\/ receivefrom (i) 

7 return res 

8 end function 



exch(Qoid)) ) 



1 function exchnot(S) 

2 res = (-iS) A VKid 

3 for each process i ^ id 

4 sendto (i, (-iS) A Wi) 

5 for each process i ^ id 

6 res = resA receivef rom (i) 

7 return res 

8 end function 



Fig. 2. Pseudo-code for a Process id in the Distributed Model Checking 



3.3 Sources of Scalability 

The efficiency of a parallelization approach is determined by the ratio between compu- 
tation complexity, normalized by computation speed, and communication complexity, 
normalized by communication bandwidth. In our parallel model checking algorithm, 
this ratio (excluding normalization, which is dependent on the underlying platform) can 
be estimated by observing that peak memory requirement for a single ^-calculus opera- 
tion of a symbolic computation is a lower bound on the computation complexity of this 
operation. On average, in the distributed setup, the size of BDD structures that are sent 
(received) by a process is a fraction of its BDD manager size at the end of the operation 
(after memory balance). Thus, roughly speaking, for a single operation computation, 
peak memory utilization bounds from below the computation complexity, whereas the 
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size of the BDD manager represents the communication complexity. General wisdom 
holds that the ratio between peak and manager sizes reaches 2 or 3 orders of magni- 
tudes, which, for current computing platforms is sufficient to keep the processor and 
communication subsystems equally busy. Indeed, our experiments with previous paral- 
lel symbolic computations in a distributed setup consisting of a slow network confirmed 
the efficiency of this approach 

Scalability of a parallel system is the ability to include more processes in order to 
handle larger inputs of higher complexity. Linear scalability is used to describe a par- 
allel system that does not loose performance while scaling up. Recall that the volume 
of communication performed by a single process in our algorithm during a single op- 
eration, may be represented on average by a fraction of its BDD manager size at the 
end of the operation. Also, the corresponding peak memory that is used by the pro- 
cess during that operation is bounded by the size of its memory module (otherwise the 
operation overflows). By the above mentioned ratio between the sizes of the peak and 
the BDD manager, the manager size (in between operations) is also bounded. Thus, 
using our effective slicing procedure, the local BDD manager size does not increase 
when the system is scaled up globally in order to check larger models using more pro- 
cesses. Thus, the ratio between computation and communication for each process does 
not vary substantially when the system scales up, implying almost linear scalability of 
our distributed model checking algorithm. 

Finally, we note that a higher ratio of peak to BDD manager sizes, which may result 
from a larger transition system in larger models, will enhance the scalability of our 
parallel approach. Since the size of memory module limits the peak size, a higher ratio 
implies smaller BDD manager, which, in turn, implies lower communication volumes. 
Thus, when the checked models grow, the method may exhibit super-linear scalability. 

4 Correctness 

In this section we prove the correctness of the distributed algorithm, assuming the se- 
quential algorithm is correct. The sequential algorithm evaluates a formula by comput- 
ing the set of states satisfying this formula. In the distributed algorithm every such set 
is partitioned among the processes. The union over all the partitions for a given subfor- 
mula is called the global set. In the proof we show that, for every /i-calculus formula, 
the set of states computed by the sequential algorithm is identical to the global set com- 
puted by the distributed algorithm. Note that, the global set is never actually computed 
and is introduced only for the sake of the correctness proof. In the proof that follows 
we need the following definition. 

Definition 2. [Well Partitioned Environment] An environment e is well partitioned by 
parts ei, . . . , efc if and only if, for every Q G VAR, e(Q) = V?=i ^i{Q)- 

The procedures exch are applied by all processes with a set of non-disjoint subsets 
Si that cover a set res. Given a set of window functions, the procedures exchange non- 
owned parts so that at termination each process has all the states from res it owns. The 
set of window functions do not change. 

Let / be a ^-calculus formula, eid be the environment in process id. pevidif, e^) 
denotes the set of states returned by procedure pev, when run by process id on / and 

^id- 
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Theoremjdefines the relationship between the outputs of the sequential and the dis- 
tributed algorithms. 

Theorem 1 (Correctness). Let f be a ^-calculus formula, e be a well partitioned en- 
vironment by ei, . . .Ck, e' be the environment when ev(/, e) terminates and for all 
i = 1, . . . , k, be the environment when pevi(/, ef) terminates. Then, e! is well par- 
titioned by e'l, .. . e'j. and ev( /, e)= VLiPevif/, ej. 

Proof: We prove the theorem by induction on the structure of /. In all but the last two 
cases of the induction step the environments are not changed and therefore e' is well 
partitioned by . . . e'f. . Due to lack of space we only consider several of the more 
interesting cases. 

Base: f = p for p G AP - Immediate. 

Induction: 

f = Q, where Q G VAR is a relational variable: V?=iP6w(Q, e^) = V?=i ^i(Q)- 
Since e is well partitioned, e{Q) = Vti Si{Q), which is equal to ev(/, e). 
f = ^9- peWd(^5j Gid) first applies pevid{g, eif) which results with Sid- It then runs 
the procedure exchnotCiSid) that returns the result resid- 

k 

resid = {{-^Sid) A Wid) A /\((^5,)AlC,d) = l\{hS,)AW,d). 

j^id j = l 

When exchnot terminates in all processes, the global set computed by all processes 
is (recall that Vti = 1)' 

V = f\hs,)Ayw. = /\{^s,) = s,. 

i=i \j=i ) i=i i=i i=i i=i 

Since Si = pevi(g, e^), ^ Vj=i ^ Vj=iP®w( 5 , e^), which by the induction 

hypothesis is identical to ^ ev(g, e). This, in turn, is identical to ev(^g, e). Applying 
IdBlnc at the end of pev, repartitions the subsets between the processes, however, 
their disjunction remains the same. Thus, ev(^g, e)= Vi=iPSVi(^g, e^). 

/ = gi V 9A pevid(/, Cid) first computes pe^id{gi , eid) V pevid(g 2 , Cid)- At the end 
of this computation, the global set is: 

k k k 

V (p®^i(5i>ei) VpeVi(g2,ei)) = \J pe^i{gi,€i) V \J pe^i{g2,ei). 

2=1 2=1 2=1 

By the induction hypothesis, this is identical to ev(gi , e) V ev{g 2 , e) which is identical 
to ev(gi V g 2 , e). Applying the procedures exch and IdBlnc change the partition of 
the sets among the processes, but not the global set. 

/ = pQ.g, a least fixpoint formula: As in previous cases, we would like to prove that 
\/i^i pevfpQ.g, ef) = ev{pQ.g, e). Since IdBlnc does not change the correctness 
of this claim, we only need to prove that Vti f ixpti(Qj 9, Ci, False)) = 
f ixpt(Q, g, e, False)). In addition, we need to show that the environment remains 
well partitioned when the computation terminates. The following lemma proves stronger 
requirements. The lemma uses the following property of procedure parterm. 
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Property 1. Procedure parterm is invoked by each of the processes with a boolean 
parameter. If all parameters are True, then parterm returns True to all processes. 
Otherwise, it returns False to all processes. 

Lemma 1. Let , be the value of Qyai in iteration j of the sequential fixpoint algo- 
rithm. Similarly, let be the value of Qvai in iteration j of the distributed fixpoint 
algorithm in process id. is the initialization of the sequential algorithm; <5°^ w the 
initialization of the distributed algorithm. Then, • In every iteration, e is well parti- 
tioned by Cl, ... ,6k. • For every j: = Vti QI- • If the sequential f ixpt algo- 

rithm terminates after io iterations then so does the distributed fixpoint algorithm. 

Proof: We prove the lemma by induction on the number j of iterations in the loop of 
the sequential function f ixpt. 

Base: j = 0: 

• At iteration 0, e is well partitioned based on the induction hypothesis of Theorem^ 

• In case / = p,Q.g, the initialization of the sequential algorithm, as well as the dis- 
tributed algorithm is False. Hence, = False which implies = V?=i Qi- 

• Both algorithms perform at least one iteration, so they do not terminate at iteration 0. 
Induction: Assume Lemmajholds for iteration j. We prove it for iteration j -h 1. 

• Let e' , e'l, ... , e'^, be the environments at the end of iteration j -f 1, and assume that 
e is well partitioned by ei, . . . , Cfc at the end of iteration j. The only changes to the 
environments in iteration j -\- 1 may occur in line 5 of the distributed and sequential 
algorithms. In the sequential algorithm e may be changed in two ways: e(Q) is assigned 
a new value Qf or a recursive call to ev may change e. Similarly, in the distributed 
algorithm two changes may occur: eid{Q) is assigned a new value or a recursive 
call to pevid may change eid- 

By the induction hypothesis of Lemma J we know that = Vi=i hence 
e[Q -f— Q^]{Q) = V^=i efiQ Q{](Q). Since no other change has been made to 
the environments, and since e is well partitioned, we conclude that e[Q <— Q^] is well 
partitioned by ei [Q ^ Qi] , ■ • ■ , Cfc [Q ^ Qfc] ■ 

In iteration j -f 1, ev in now invoked with an environment that is well partitioned 
by the environments pev^d is invoked with. The induction hypothesis of Theorem^ 
therefore guarantees that e' is well partitioned by , . . . , e^.. 

• = esr{g, e[Q <— Q-’]) (line 5 of the sequential algorithm) and = 
pe.~^idig, e[Q ^ Qid\) 5 of the distributed algorithm). 

By the first bullet above, e[Q ^ Q^] is well partitioned. Thus, the induction hy- 
pothesis of Theoremjis applicable and implies that e~v{g, e[Q ^ Q^j) = 

Pevi(5, e[Q ^ Q^j). Hence, QJ+i = VjLi 

• The sequential f ixpt procedure terminates at iteration j -f 1 if We 

prove that this holds if and only if for every process id, exch(Qif) = exch(Q^ j"^) and 
therefore parterm returns True to all processes. 

Let Wi , . . . , Wk be the current window functions. By the second bullet above, = 

VtiQ^andQ^+i = VtiQf'. 

k k 

yid[exch{Qlf) = exch{Ql^^)] \/id[\J Qf A Wtd = \J AWid] AA 

2=1 2=1 
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\ftd[Q^AW,d = ^ 

The last equality is implied by the previous one since the window functions are com- 
plete. This complete the proof of the lemma and also the proof of the theorem. Q.E.D. 

The above theorem can be extended to state that when all procedures pevidif, eid) 
terminate, the subsets owned by each of the processes are disjoint. This is important in 
order to avoid duplication of work. However, it is not necessary for the correctness of 
the model checking algorithm. 



5 Scalable Distributed Pre-image Computation 

The main goal of our distributed algorithm is to reduce the memory requirement. In 
symbolic model checking, pre-image is one of the operations with the highest memory 
requirement. Given a set of states S, pre-image computes pred{S) (also denoted by 
EX S in p-calculus), which is the set of all predecessors of states in S. The pre-image 
operation can be described by the formula pred( S') = 3s'[i?(s, s') A S(s')]. It is easy 
to see that the memory requirement of this operation grows with the sizes of the transi- 
tion relation R and the set S. Furthermore, intermediate results sometimes exceed the 
memory capacity even when pred{S) can be held in memory. 

Our distributed algorithm reduces memory requirements by slicing each of the com- 
puted sets of states. This takes care of the S parameter of pre-image, but not of R. In 
order to make our method scalable for very large models, we need to reduce the size of 
the transition relation as well. 

The transition relation consists of pairs of states. We distinguish between the source 
states and the target states by refer to the latter as Sf . Thus, R C St x Sf . 

A reduction of the second parameter of R, St' , can be achieved by applying the 
well-known restriction operator Prior to any application of pre-image, a process 
that owns a slice Si of S reduces its copy of R by restricting St' to Si. This reduction is 
dynamic since pre-image operations are applied to different sets during model checking. 

We further reduce R by adding a static slicing of St according t o (po ssibly different) 
window functions Ui, . . Um- The slicing algorithm of Section can be used to 
produce Ui,. . Um, so that R is partitioned to m slices of similar size. Each slice 
Rj is a subset of {St D Uj) x St' . Since R does not change during the computation, 
Ui, . ■ Um do not change as well. 

Having k window functions Wi , . . . , Wk for S and m window functions Ui, . . Um 
for R, we use k groups of m processes each. All processes in the same group have 
the same Wi, and hence own the same Si = S' n Wi. However, each process in the 
group has a different Uj. Process {i,j) with Wi and Uj computes pre-image of Si by 
predj(Si) = 3s'[i?j(s, s') A Si (s')]. Since Ui, . . ., Um is a complete set of window 
functions, \J'^^iPt'edj{Si) = pred{Si). Thus, the group with window function Wi 
computes the same set as process i in the algorithm of Section^ 

Once the computation is completed, procedure exch is applied to exchange non- 
owned states (according to Wi). Procedure IdBlnc is used to update the Wi window 
functions in order to balance the memory load. Both procedures are defined as before. 
However, when IdBlnc changes the window functions, all members in each of the 
groups should agree on the new window function. 
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The Figure above demonstrates a pre-image computation using sliced transition re- 
lation with fc = 2 and m = 3 . Given a set S sliced into Si, S2 according to Wi, W2 
respectively, the pre-image of is computed by three processes. Each process uses a 
different slice of the transition relation, i?i, i?2 and i?3, according to Ui, U2 and U3. 

The method suggested in this section applies slicing to the full transition relation 
in case it can be held in memory, but is too large to enable a successful completion of 
the pre-image operation. However, often the transition relation is given partitioned, i.e., 
given as a set of small relations Ni, each defining the value of variable vi in the next 
states. The size of the partitioned transition relation is usually small, therefore can be 
constructed by one process and then be sliced using the algorithm suggested in | ' ^ | . In 
this case the model checking is done directly with the partitioned transition relation Q. 



5.1 Distributed Construction of the Sliced Full Transition Relation 

The full transition relation i? is a conjunction of all TV/. Here we consider cases where 
either R or its construction cannot fit into the memory of a single process. 

Our goal is to construct slices Rj of R, with none of the processes ever holding R. 
Each process starts constructing by gradually conjuncting partitions Ni, until a thresh- 
old is reached. The current (partial) transition relation is then partitioned among the 
processes, using the slicing algorithm. Each process continues to conjunct the partitions 
that have not been handled yet, until all partitions are conjuncted. During conjunction, 
further slicing or balancing are applied so that the final slices will be balanced. 
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1 Introduction 

Since the introduction of temporal logic for the specification of computer pro- 
grams jSl, usability has been an issue, because a difhcult-to-use formalism is 
a barrier to the wide adoption of formal methods. Our solution is Sugar, the 
temporal logic used by the RuleBase formal verification tool 0 . Sugar adds the 
power of regular expressions to CTL 0, as well as an extensive set of operators 
which provide syntactic sugar. That is, while these operators do not add expres- 
sive power, they allow properties to be expressed more succinctly than in the 
basic language. Experience shows that Sugar allows hardware engineers to easily 
and intuitively specify their designs. The full language is used for model check- 
ing, and a significant portion can be model checked on-the-fly Pj. The automatic 
generation of simulation checkers from the same portion of Sugar is described in 
P. While previous papers have described various features of the language, this 
paper presents the first complete description of Sugar. 

2 The Basic Language 

We use boolean expressions to describe states in the model, and Sugar Extended 
Regular Expressions to describe sequences of states, and define them as follows: 



Definition 1. (Boolean Expression). 

1. Every atomic proposition is a boolean expression. 

2. If b, bi , and 62 are boolean expressions, then so are ~^b and 61 A 62 . 

Definition 2. (Sugar Extended Regular Expressions (SEREs)). 

1. Every boolean expression is a SERE. 

2. If r, Ti, and T2 are SEREs, then so are the following: i) r\,r2 H) r\ ^ V2 
Hi) ri\\r2 iv) ri && r2 v) r[*] 

A comma denotes concatenation, ^ denotes overlapping concatenation, where 
the last state of ri coincides with the first state of T 2 , || denotes disjunction, && 
denotes conjunction, and [*] is used to specify 0 or more repetitions. 

There are two ways to use SEREs in Sugar formulas. The first is to link two 
SEREs in order to form Sugar formulas of the linear fragment, as defined in 
Definition 13 below. A second way is to link a single SERE with a general Sugar 
formula, as defined in Definition g] below. 
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Definition 3. (Sugar Formulas of the Linear Fragment^) . If r\ and T2 

are SEREs, then {ri} {^2}! and {ri} \-^ {t 2\ are Sugar formulas of the linear 
fragment. 

The {ri} {r2}! and {ri} 1— > {r2} constructs are known as strong suffix im- 
plication and weak suffix implication, respectively. Strong suffix implications are 
liveness formulas, indicating that every sequence of states on which ri holds 
must be followed by a sequence of states on which r2 holds. Weak suffix impli- 
cations are safety formulas, indicating that every sequence of states on which 
ri holds may not be followed by a sequence of states which contradicts r2. For 
instance, the Sugar formula {[*],p, g} I— > {s[*],f}! requires that every sequence 
of two states such that p is valid in the first and q is valid in the second, must 
be followed by a sequence of states in which s is valid for some number of states, 
and then t is valid in the final state of the sequence. The weak form of this 
formula does not require that the second sequence “reach its end”: a sequence 
matching {p, q} must be followed either by a sequence in which s holds forever, 
or by a sequence in which s holds for some number of states, and then t holds. 

Definition 4. (Sugar Formulas). 

1 . Every boolean expression is a Sugar formula. 

2 . Every Sugar formula of the linear fragment is a Sugar formula. 

3 - If f ) fi, o,nd /2 are Sugar formulas and r is a SERE, then the following are 
Sugar formulas: i) ~^f ii) fil\f2 Hi) EXf iv) E[fiUf2] v) EGf vi) {r}{f) 

The operators A, EX, EU, and EG have the usual meaning. The construct 
{r}(/) {suffix implication) holds for a state s if, for all finite sequences starting 
from s on which r holds, formula / holds on the final state in the sequence r. 



3 Syntactic Sugar 

Because the basic language can be verbose. Sugar adds syntactic sugar: addi- 
tional operators which allow many properties to be expressed succinctly in an 
intuitive manner. We will now illustrate the advantages of the syntactic sugar 
with a few examplefl 



The next_event Operators. These operators are a conceptual extension of 
the AX operator. While AX refers to the next state, next-event refers to the 
next state in which a boolean expression is valid. For instance, the following: 

AG{hi-pri-req — > next-event-f{gnt)[ 1 .. 2 ]{dst = hi-pri)) (1) 

^ Note that Sugar formulas of the linear fragment are not closed under the boolean 
operators. The result of a boolean operation on two Sugar formulas of the linear 
fragment is a general Sugar formula as described in Definitional 
^ The abbreviations presented here and in Appendix 0 are given as an explanatory 
semantics and do not imply the actual implementation. 
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expresses the requirement that whenever hijprijreq is asserted, one of the next 
two assertions of signal gnt must have dst equal to hijpri. 

The next_event operator, and its variant next-event_f(b)[1..2](f) are defined 
in terms of the weak until (AW) operator as follows: A[fi W / 2 ] is equivalent 
to ^E[-^f 2 U ^/i A ^/ 2 ], next.event(b)(f) is equivalent to A[-^b W b A f], and 
next-event(b)[1..2]{f) is equivalent to next-event(b){f V AXnext-event(b)(f)). 
Thus, Formulancan be expressed in CTL with the addition of the AW operator 
as follows: 



AG(hijprijreq A[-^gnt W ((gnt A dst = hijpri)\/ 

(gnt A AX A[-^ gnt W (gnt A dst = hi 4 >ri)]))]) (2) 

The within Operators. The within operators ease the expression of require- 
ments such as the following: “every transaction must complete, and within every 
transaction, a full data transfer must occur”, which is expressible in Sugar as: 

AG withinl(trstrt, tr_end){true[*],dat_strt, true[*], dat_end} (3) 

within\(ri,b){r 2 } is equivalent to {ri} i— > {r 2 && {^6[*]}, 6}!. Thus, For- 

mulaOcan be expressed (albeit somewhat cryptically) in CTL as follows: 

AG(tr _strt A[-^dat_strt A ~^tr_end U 
dat.strt A A[-^tr-end U dat-cnd A A[-^tr-end U tr_end\]\) (4) 

Counters. Counters are used to describe sequences of events that would other- 
wise be tedious to specify. For example, i consecutive occurrences of sequence r 
can be expressed as r[i], and i non-consecutive occurrences of boolean expression 
b can be expressed as b\= i]. Formally, r[0] is equivalent to false)*], while r\i] is 
equivalent to i concatenations of r, and 5[= i] is equivalent to {^6]*], 6}[z], ^5]*]. 
The utility of the b\= i] construct is illustrated in the following Sugar formula: 

AG({go, {get[= 8]}k,k,{kill[= 0]}} {true, {put[= 8]}SzSz{end[= 0]}}) (5) 

which expresses the requirement that a sequence beginning with the assertion 
of signal go, and containing eight not necessarily consecutive assertions of signal 
get, during which signal kill is not asserted, must be followed by a sequence 
containing eight assertions of signal put before signal end can be asserted. The 
equivalent CTL formula is both non-intuitive and tedious. The CTL formula 
expressing the same requirement but for sequences of only two gets and puts 
illustrates this point: 

AG^(goAEX E[-^getA^kill U getA^killAEX E[^getA^kill U getA^kill 
AE[^put U end]\/ E[^putA^end U (putA^endAEXE[^put U end])]]) (6) 

Formulas^ 0and Elcan also be expressed in LTL 0. However, the equivalent 
LTL formulas are not any less daunting to code or decipher than the CTL 
versions, while the Sugar version expresses the requirements succinctly, in a 
manner accessible to the non-logician. 
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A Formal Semantics 



The semantics of a Sugar formula are defined with respect to a model M. A model is a quintuple 
{S, So, R, P, L), where S is a finite set of states, So Q S is a set of initial states, R Q S X S is the 
transition relation, P is a non-empty set of atomic propositions, and L is the valuation, a function 

L : S 2 ^, mapping each state with a set of atomic propositions true in that state. R is total 

with respect to its first argument. A computation path tt of a model M is an infinite sequence of 
states TT = (tto , "TTi , 7T2 , • • •) such that R{'rri , Tri+i) for every i. We will denote by a finite sequence 
of states starting from and ending in ttj. 

The semantics of SEREs are defined over the alphabet 2 ^. Thus, a letter is a subset of the set 
of atomic propositions P. We will denote a letter from 2 ^ by H and a finite word over 2 ^ by w. 
The concatenation of wi and W2 is denoted by wiW2- The empty word is denoted by e, so that 
we — ew — w. The notation w G C{r), where r is a SERE, means that w is in the language of r. 
The semantics of SEREs are defined as follows: 



1 . 10 G C(p) there exists an t s.t. w — i and p ^ £ 

2 . 10 G Cl^b) 10 0 C{b) 

3 . 10 G Clbi A 62) w G ^(^1) and 10 G ^(^2) 

4 . 10 G C(ri, V2) there exist loi and 102 s.t. 10 = 101102 and loi G and 102 G C{v2) 

5 . 10 G C{r\ ~ V2) there exist loi, 102, and £ s.t. 10 = 101^102 and w-f£ G and £w2 G C{r2) 

6 . 10 G C\ri\\r2) *^= 4 ' 10 G C{ri) or 10 G C{t2) 

7 . 10 G C\r\ T2) 10 G and 10 G C{t2) 

8. 10 G '^(’"[*]) *^= 4 - either 10 = e or there exist 101,102, . . . ,Wj s.t. 10 = 101102 . . .Wj and, for all i. 



Recall that every state s G S in a model M — {S, So, R, P, L) is associated with a set of atomic 
propositions by the valuation L. We define L, an extension of the valuation function L as follows: 
L{7Ti, TTi+i, . . . TTj) = L(7ri)L(7ri+i) . . . L(7rj). Thus we have a mapping from states in M to letters 
of 2 ^, and from finite sequences of states in M to words over 2 ^ . 

We now turn to the semantics of Sugar formulas. The notation M, s \— f means that formula 
/ holds in state s of model M . The notation M |= / is equivalent to Vs G So M, s |= /, in other 
words, / is valid for all initial states of M. We use p, pi and p2 to denote atomic propositions, b, 
bi and 62 to denote boolean expressions, r, ri and V2 to denote SEREs, and /, /i and /2 to denote 
Sugar formulas. The semantics of a Sugar formula are defined as follows: 



1 . 

2 . 

3 . 

4 . 



M,s 

M,s 

M,s 



= P P e L{s) 

= /i A /2 M, s 1 = fi and M, s |= /2 



M,s \— ri 1 —^ r2! for all paths tt s.t. ttq = s, for all j s.t. G C{ri), there exists a 

k s.t. L(7r-^’^) G C(r2) 



5. M,s 1= n I— ^ T 2 for all paths tt s.t. ttq = s, for all j s.t. G C{ri), either there 

exists a k s.t. G C{r2), or for all k, there exists a word w (not necessarily a computation 

path in M) s.t. L{ 7 t^'^)w G £^(^2) 

6. M, s 1= PX f for some path tt s.t. ttq — s, M, tti |= / 

7. M,s \— E[fiUf 2 ] for some path tt s.t. ttq = s, there exists k s.t. M,-Kk \— f 2 and for all j 

s.t. j < k, M, -Kj 1= fi 

8. M, s 1= EG f for some path tt s.t. ttq — s, for all j > 0, M, ttj |= / 

9. M, s {'^}(/) for all paths tt s.t. ttq — s, for all j s.t. ) G C{r), M, -Kj f 
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B The Full Syntactic Sugar 



Additional boolean operators 

1. bi V b2 — ^{—'bi A ^ 62 ) 2. V 62 

2. 61 — » 62 — V b2 4. 61 0 62 — (^>1 A ^ 62 ) V A 62 ) 

Additional SERE operators (where i, 7 , and fc are integer constants s.t. i > 0, i ~> i, k > 0) 

f false[*] if i ^ 0 

1 . 2 . r\t] ^ I ' ^ 3. r[z..i] -r[ 2 ]||r[i+l]||...||r[j] 

r, r, . . . , r otherwise 

4. r[z..] — r[z], r[*] 5. — r[0] | |r[l] | |r[2] 1 1 . . . | |r [i] 6 . [+] — true[+] 

7. [*] = true[*] 8 . [z] = true[z] 9. [A.j] = true[z..j] 

10 . [z..] — true[z..] 11 . [..z] — true[..z] 12 . b[— z] — {^&[*], ^}[z], 

13. b[> z] — b[— i + 1], [*] 14. b[< fc] — b[— 0]||h[= 1]|| . . . \\b[— {k — 1)] 

15. b[> i] — 6 [> z]||f)[= z] 16. b[< i] — b[< z]||5[= z] 17. f)[> z, < j] = &[> z] && &[< j] 

IS. b[> i, < j] = b[> i] && 6 [< j] 19. 5[> z, < j] — 6 [> z] Sz&c 5[< j] 20. b[> z, < j] = b[> i] && b[< j] 

Additional linear operators 



1 . always{r} — {true[*]} 1 —^ {r} 2 . nez;er{r} = {true[*],r} 1 —^ {false} 

3. eventually {r} — {true} {true[*], r}! 4. within\{ri , 5 ){r 2 } = {ri} \-^{v2 && 5[= 0], b}\ 

5. within{ri, 5 ){r- 2 } — {'/"i} 1 —^ {'^'2 b[— 0]} 



6. within\.{ri, b){r 2 } — {z^i} 1—^ {{"^2 0], fc}||{z ’2 ^}}}- 



7. within^{ri , 5 ){z '2 } = {ti } I— ^ {v 2 S-zSz {“i5[*] , true}} S.whilenot\{b){r} — within\{truG, 5){r} 
9. whilenot{b){r} — within{truG, 5){r} 10. whilenot\^{b){r} = zuzt/zzn!_(true, b){r} 

11. whilenot^{b){r} — znzt/zzn_(true, 5){z'} 12. {ri} {'^ 2 }! = {"^i} I — {true, r 2 }! 



13. {ri} 1=4* {^ 2 } = {’^i} 1-^ {true, r 2 } 

Additional branching operators (where z and j are integers s.t. z > 0 and j > z) 

1. EF f ^ E[truGUf] 2. AXf = ^EX^f 3. AGf = ^£;[true U ^f] 

4. A[/i U f2] = ^{E[^f2 U -/i A -/2] V EG-/ 2 ) 

5. AF/ = A[true 1/ /] 6. E[h W f 2 ] ^ E[h t/ / 2 ] V FG/i7. A[/i 1^ / 2 ] = -Fh /2 (/ -/i A-/ 2 ] 



8. AX[i]f - AXAX ...AXf 



9. ABG[i..j]f - AX[i\f A AX[i + 1]/ A . 

j — i times 



■ ATX[i]/ 



10. ABF[t..j]f = AX[i]{f V AX{f V AX{. 

11. /i until'. /2 = T[/i U f-i] 

13. /i until'.. /2 = A[fi U fi A h] 

15. fi before'. /2 = T [^/2 U fi] 

17. fi before'., f-z = A[^f 2 U fi A ^/ 2 ] 

19. next.event\(b){f) = A[—'b U b A f] 

21. neaif_ez;ent!(6)[z](/) = next^event\{b) 



./VAJf (/)...))) 

12. /i until /2 = A[/i W f 2 ] 

14. /i until. f 2 = A[/i Vr /i A / 2 ] 

16. fi before f 2 = A [^/2 W fi] 

18. fi before. f 2 = A [^/2 W fi A ^/ 2 ] 
20. next.event{b){f) — A[-i6 W b A f] 



{AXnext^event\{b){AXnext^event\{b) . . 
22. next^event{b)[i]{f) = next^event{b) 



. {AXnext^event\{b){f) . . .))) 



i — 1 times 

/ \ 

(AXnext^event{b){AXnext^event{b) . . . {AXnext^event{b){f) . . .))) 

23. next^event\{b)[i..j]{f) — next^event\{b)[i]{f) A ... A next^event\{b)[j]{f) 

24. next^event{b)[i..j]{f) — next^event{b)[i]{f) A ... A next^event{b)[j]{f) 

25. next^event^f\{b)[i..j]{f) — next^event\{b)[i] 

j — i times 

/ s 

(/ V AXnext.event\{b){f V AXnext.event\{b) . . . (/ V AXnext.event\{b){f) . . .))) 

26. next^event^f {b)[i..j]{f) — next^event{b)[i] 

j — i times 

/ s 

(/ V AXnext^event{b){f V AXnext^event{b) . . . (/ V AXnext^event{b){f) . . .))) 
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1 Introduction 

Finite-state model-checkers such as Smv m and Spin El do not allow to deal 
with important aspects that appear in modelling and analysing complex systems, 
e.g., communication protocols. Among these aspects: real-time constraints, ma- 
nipulation of unbounded data structures like counters, communication through 
unbounded channels, parametric reasoning, etc. 

The tool we propose, called TReX, allows to analyse automatically automata- 
based models equipped with variables of different kinds of infinite-domahi data 
structures and with parameters (i.e., uninstantiated constants). These models 
are, at the present time, parametric (continuous-time) timed automata, ex- 
tended with integer counters and communicating through unbounded lossy FIFO 
queues. 

The techniques used in TReX are based on symbolic reachability analysis. 
Symbolic representation structures are used to represent infinite sets of config- 
urations, and forward/backward exploration procedures are used to generate a 
symbolic reachability graph. The termination is not guaranteed, but efficient 
extrapolation techniques are used to help it. These techniques are based on com- 
puting the (exact) effect of the iteration of control loops detected dynamically 
during the search. 

The kernel algorithm used in TReX is generic and can be used for any kind 
of data structures for which it is possible to provide a symbolic representation 
structure, a symbolic successor/predecessor function, and an extrapolation pro- 
cedure. In the current version, TReX provides packages for symbolic represen- 
tation of configurations of lossy FIFO channels and parametric timed automata 
and clock automata. 

TReX allows to check on-the-fly safety properties, as well as to generate the 
set of reachable configurations and a finite symbolic graph. The set of reachable 
configurations can be used as an invariant of the system. For instance, if the 
analysed infinite-state model M is already an abstraction of a more concrete 
one M', the set of reachable configurations of M can be used to construct an 
invariant of M' which may help in its analysis. On the other hand, the generated 
finite symbolic graph is a finite abstraction of the analysed model, which can be 
used for (conservative) finite-state model checking. 
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TReX is connected to the If 0 environment which allows: (1) the use of 
high-level specification languages such as Sdl, (2) the interaction with abstrac- 
tion tools and invariant checkers such as InVeSt P], (3) the use of finite-state 
model checkers such as Cadp P] and Spin to verify properties on the finite 
symbolic graph. 

TReX has been used to analyse several nontrivial protocols in their para- 
metric versions, such as the Bounded Retransmission Protocol (BRP) p|. This 
particular example requires the full power of TReX since it is a parametric het- 
erogeneous model involving clocks, counters, and lossy channels. Moreover, the 
constraints manipulated in this model are nonlinear (contain products between 
variables). As far as we know, TReX is the only existing tool which allows to deal 
fully automatically with such a complex model. Indeed, tools like HyTech mg 
and LPmc [El deal with timed/hybrid automata and linear constraints, while 
Lash [E| deals with counter automata. 

2 Architecture 

Figured shows the overall environment and architecture of TReX. 

In addition to the description of the model in If, the user of TReX can 
specify the initial constraints (invariants) on parameters, the initial symbolic 
configuration for the beginning of reachability analysis, and/or the safety prop- 
erty to be checked on-the-fiy, expressed by an observer written in If. 






Fig. 1. Overview of the TReX’s Architecture and Environment. 



From the analysis of the input model, TReX instantiates automatically the 
generic reachability algorithm with the representation structures needed by the 
infinite data domains used. 

Two such representations are actually provided in TReX. The first one is 
well suited for representing the contents of unbounded lossy FIFO-channels. We 
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implemented a package for manipulating a class of regular expressions, simple 
regular expressions (SREs) which is exactly the class of downward closed 
regular languages. This representation is interesting because the operations ma- 
nipulating SREs during the symbolic analysis (the inclusion test, the effect of a 
transition, and the arbitrary number of executions of a control loop) are poly- 
nomial. 

The second representation deals with sets of configurations of parametric 
timed automata and counter automata. We implemented a package for manip- 
ulating Constrained PDBMs {Parametric Difference Bound Matrices) |^. The 
use of Constrained PDBMs allows to deal in a uniform way with counter/clock 
automata, parametric/non-parametric models, and systems generating linear or 
nonlinear arithmetical constraints. The package provides compact representa- 
tion of PDBMs and efficient methods for operations used during symbolic anal- 
ysis (e.g., emptiness check, intersection, and inclusion test). A special effort has 
been devoted to develop efficient representation of terms and formulas used in 
Constrained PDBMs, and simplification techniques on these objects. For this, 
we implemented a package, called Foaf {First-Order Arithmetical Formulas), 
which also gives the kind (linear or non-linear) of terms and formulas. This anal- 
ysis is needed in order to apply the right decision procedure for the satisfiability 
of formulas. 

The external decision procedures used actually by TReX are those offered 
by Omega m for formulas over integers and by the Redlog package of Re- 
DUGE jO] for formulas over reals. Moreover, we implemented in the Foaf package 
the Fourier-Motzkin procedure jTj for elimination of quantifiers over real vari- 
ables. 

The symbolic graph generated by TReX is given by a couple of files: a file 
describing the transitions between reachable symbolic configuration given in the 
Aldebaran format and a file listing the reachable symbolic configurations. The 
Aldebaran file can be directly used for finite model-checking using the Cadp 
tool. Reachable configurations may be used to extract new initial constraints 
(invariants) for the model and to do abstraction with InVeSt. 

Each part of the TReX architecture has been implemented as an indepen- 
dent C-I--I- module. This allows easy extension of TReX with new symbolic 
representations, analysis algorithms, and decision procedures. 



3 Results and Future Work 



TReX has been applied in a number of infinite state and/or parameterized 
protocols like: lift controller, Backery algorithm, BRP protocol, FDDI protocol, 
Fischer’s protocol, alternating bit protocol (ABP), etc. 

Table Q] gives the performances obtained by applying TReX on these exam- 
ples. We consider two versions of TReX, depending on the package used for the 
decision procedure on reals: the first {Standard) uses the Foaf package and the 
second uses Redlog. 
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The columns ^‘version" specify the number of different kinds of variables 
used by each example: p for parameters, c for clocks, n for counters, f{rn) for 
lossy FIFO-channels with m messages, b for booleans, and e{v) for enumerations 
with V values. The column reach, conf.^^ specifies the number of reachable 
symbolic configurations generated during symbolic analysis. 



Table 1. Performance Statistics on a Sun Ultra 10 (Space in Mbytes, Time in 
seconds) . 



Case study 


P 


c 


n 


versio 

/("j) 


n 

b 


e{v) 


St 

space 


andard 

time 


with 

space 


Redlog 

time 


# reach, 
config. 


Lift 10 


- 


- 


3 


- 


- 


- 


6.5 


7.52 


6.5 


7.52 


8 


Lift N 


1 


- 


3 


- 


- 


- 


6.5 


8.05 


6.5 


8.05 


9 


Backery 


- 


- 


2 


- 


- 


- 


6.6 


5.68 


6.6 


5.68 


33 


Fischer 


2 


2 


- 


- 


- 


1(3) 


7 


0.65 


7 


0.61 


25 


2 


3 


- 


- 


- 


1(4) 


9.2 


159.04 


8.2 


105.82 


261 


2 


4 


- 


- 


- 


1(5) 


140 


124 920 


140 


70 316 


3 633 


ABP 


- 


- 


- 


2(4) 


- 


- 


6.9 


0.05 


6.9 


0.05 


8 


FDDI 


4 


5 


- 


- 


2 


- 


20 


1 603.50 


21 


4445 


731 


BRP 


- 


- 


- 


2(4) 


- 


- 


6.8 


0.30 


6.8 


0.30 


36 


2 


- 


2 


2(7) 


4 


- 


16.4 


195.93 


16.4 


195.93 


173 


4 


2 


1 


2(6) 


2 


- 


89 
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The most complex example for which TReX has been applied is the BRP 
protocol. It is a timed file transfer protocol used by Philips. The three versions 
verified correspond to: (1) abstraction of clocks and counters — only lossy FIFO- 
channels are considered, (2) abstraction of clocks — counters and channels are 
used, (3) full version with channels, counters for the number of retransmissions, 
and clocks for timeouts. For the last version, TReX generates automatically 
the (non-linear) constraint needed to satisfy the timing response property of the 
protocol. The constraint relates three parameters of the protocol: the timeouts 
for the sender and for the receiver, and the number of retransmissions. 

In future work, we plan to implement other data structures for the repre- 
sentation of configurations over counters and clocks, as well as to extend the 
input model to infinite nets of identical processes 0. The version 1.0 of TReX 
is available at http://www-verimag.imag.fr/~ajinichin/trex/ 
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Abstract. In this paper we present a tool which operates as a pre- and 
postprocessor for RTL property checking and simplifies word-level speci- 
fications before verification, thus speeding up property checking runtimes 
and allowing larger design sizes to be verified. The basic idea is to scale 
down design sizes by exploiting word-level information. BooStER imple- 
ments a new technique which computes a one-to-one RTL abstraction 
of a digital design in which the widths of word-level signals are reduced 
with respect to a property, i.e. the property holds for the abstract RTL 
if and only if it holds for the original RTL. The property checking task 
is completely carried out on the scaled-down version of the design. If the 
property fails then the tool computes counterexamples for the original 
RTL from counterexamples found on the reduced model. 



1 High-Level Property Checking of Digital Designs 

Today’s digital circuit designs frequently contain up to several million transistors 
and designs need to be checked to ensure that manufactured chips operate cor- 
rectly. Formal methods for verification are becoming increasingly attractive since 
they confirm design behavior without exhaustively simulating a design. Over 
the past years, bounded model checking and property checking have increased 
in significance in electronic design automation PEI- Promising approaches to 
enhance capabilities of hardware verification tools are decision procedures which 
make use of high-level design information mmm, and automated abstraction 
techniques, e.g. using uninterpreted functions and small domain instantiations 

prmj . 

We consider a property checking flow in which design specifications are given 
as VHDL or Verilog source code. Properties are specified in a linear time logic 
used in Symbolic Trajectory Evaluation and describe the intended behavior of 
the design within a finite bounded interval of time. As a first step, design and 
property are synthesized into a flattened RTL netlist, including word-level sig- 
nals, word-level gates, arithmetic units, comparators, multiplexors and memory 
elements. Each word-level signal x has a fixed width n € N+ and takes bitvec- 
tors of respective length as values. A property checker, which reads RTL netlists 
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as input, translates such representation of design and property into an internal 
bit-level representation (i.e. an instance of propositional SAT) and uses SAT, 
BDD and ATPG methods to either prove that the property holds for the given 
design or to compute a counterexample. A counterexample is an indication that 
a circuit does not function in the way intended by the designer and is given in 
terms of assignments of values to the circuit inputs, such that a violation of the 
desired behavior, which is described by the property, can be observed. 

BooStER ( Boo lean String Length Reduction) implements a new word-level 
abstraction technique developed in j^, which is embedded within the flow. In 
a preprocessing step prior to the property checker (see flg.), the tool takes the 
RTL netlist and computes a scaled down RTL model of the design in which each 
word-level signal x is replaced by a corre- 
sponding shrunken signal of width mx < 
n, where n is the original width of x, while 
guaranteeing that the property holds for 
the reduced RTL if and only if it holds 
for the original RTL. Design and abstract 
model differ from each other only as far as 
signal widths are concerned. The reduced 
RTL is given to the property checker in- 
stead of the original RTL. Depending on 
the degree of reduction, the internal bit- 
level representation computed from the re- 
duced RTL contains significantly less vari- 
ables than the one computed when using 
the non-reduced RTL. If the property does 
not hold, the counterexample returned by 
the property checker is taken, which is a 
counterexample relating to the signals of 
the reduced RTL. A corresponding coun- 
terexample for the original RTL is com- 
puted, using information about the applied 
reduction, gathered during the preprocess. 



Design Specification 
(VHDL . Veriiog) 



RTL Representation 




Counterexample for Original RTL 



2 Scaling Down RTL Designs by Signal Width Reduction 

BooStER reads an RTL representation of a design and a property and generates a 
system E of equations over a theory of fixed-size bitvectors based on [Z1 , which is 
an extension of the core theory of bitvectors presented in jSj . Our theory features 
high-level operators like bitwise Boolean operations, arithmetics (cf. |2|) and if- 
then-else, and allows complete RTL designs to be modeled. E is satisflable if and 
only if the property does not hold for the RTL. Word- level signals in the RTL 
correspond to bitvector variables in E, thus the information, which bits belong 
to the same signal, is kept. A satisfying solution of E yields a counterexample for 
the RTL. For each bitvector variable occurring in E the smallest possible number 
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of bits is computed, such that a second system E' of bitvector equations, which 
differs from E solely in the manner that variable widths are shrunken to these 
computed numbers, is satisfiable if and only if E is satisfiable. E' is generated 
using these minimum signal widths and then retranslated into a netlist, which 
is output by the tool and represents a scaled down version of the original RTL. 



RTL NellisI 





Data Dependency 
Analysis 






' System E 


System E' i 


of Bitvector Equations 


Minimum Width 
Computation 


Shrunken Signal Widths 







' Reduced RTL Netlist 



The process of scaling down signal widths is separated into two subsequent 
phases. The high-level operators occurring in the equations of E impose struc- 
tural and functional dependencies on the bitvector variables. Thereby, variables 
typically have non-uniform data dependencies, i.e. different dependencies exist 
for different chunks of a signal. Our method analyzes such dependencies and, for 
each variable, determines contiguous parts in which all bits are treated uniformly 
in the exact same manner with respect to data dependencies. Such decomposition 

of a variable into a sequence of chunks 
is called a granularity. For each such 
chunk of a signal, the necessary minimum 
width is computed, as required by dy- 
namical data dependencies. According to 
these computed minimum chunk widths, 
the reduced width for the corresponding 
shrunken signal is reassembled (see |7I?S| 
for further details on the reduction). 

3 Experimental Results 

BooStER is implemented in Cf- 1- and was tested in several case studies at the 
EDA department of Siemens Corp. in Munich and at Infineon Techn. in San Jose. 
Test cases were run on a PII 450 Mhz Linux PC with 128 MB. The tool operated 
as a preprocessor to the property checker used at Siemens and Infineon. All run- 
times on reduced models were compared to those achieved on the original designs 
without preprocessing. As an example, we here consider the management unit 
of an ATM switching element. The design consists of 3.000 lines of Verilog code, 
the netlist synthesis has approx. 24.000 gates and 35.000 RAM cells. The RTL 
incorporates 16 FIFO queue buffers and complex control logic. Data packages 
are fed on 33 input channels to the management unit, stored in the FIFOs and 
upon request are output on one of 17 output channels, while the cell sequence 
has to be preserved and no package must be dropped from the management unit. 
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Computation times for 


nop 




2.96 secs 


pre- and postprocessing 
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6.53 secs 
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nop 


160 cells X 10 bit 
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nop 
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read 
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10592 (33.6 %) 


of property 


write 
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5163 (35.3 %) 


Overall number of gates in 


nop 
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5661 (27.9 %) 


synthesized netlist 


read / write 


23801 


7929 (33.3 %) 


Number of state bits 


nop 


1658 


362 (21.8 %) 




read / write 


1658 


524 (31.6 %) 


Property checker runtimes 


nop 


23:33 min 


37.96 secs ( 2.7 %) 




read 


42:23 min 


3:27 min ( 8.1 %) 




write_f ail 


2:08 min 


25.66 secs (19.5 %) 




writejiold 


27:08 min 


1:08 min ( 4.2 %) 



Three different properties (nop, read, write) had to be verified, which specified 
the intended behavior within a range of 4 timesteps (nop, write) and 6 timesteps 
(read). Results and CPU times are are shown above. As can be seen, in all cases 




the data path signals could be scaled. This is illustrated in the block diagrams 
above, showing the original design and the reduced model for the read property. 
We encountered a significant reduction in the sizes of the design models and 
a tremendous drop in the runtimes of the property checker. It turned out that 
the write property did not hold due to a design bug in the Verilog code. A 
counterexample for the reduced model was found (write Jail) from which the 
tool computed a counterexample for the original design, whereupon the bug was 
fixed and the property was again checked on the corrected design (writeJiold). 

4 Conclusions 

Reducing runtimes and the amount of memory needed in computations is one 
requirement in order to match today’s sizes of real world designs in hardware 
verification. We have presented a tool that efficiently simplifies word-level cir- 
cuit specifications for RTL property checking by scaling down the widths of 
input, output and internal signals. A linear reduction from n bits down to m 
bits, m < n, causes an exponential reduction of the induced state space of the 
signal from 2" to 2™, while reduced state space sizes coincide with increased 
verification performance. Our method provides a one-to-one RTL abstraction. 
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which interprets all RTL operators and which strictly separates the pre- and 
postprocessing of design and counterexample, and the property checking process 
itself. Thus, the proposed method is independent of the concrete realization of 
the property checker and can be combined with a variety of existing techniques 
which take RTL netlists as input. Due to providing a one-to-one abstraction, 
postprocessing of counterexamples is straightforward, false-negatives cannot oc- 
cur. Moreover, if preprocessing yields that no reduction is possible for a given 
design and a property, then abstract model and original design are identical, so 
the verification task itself is not impaired by using the proposed abstraction, and 
in all case studies pre- and postprocessing runtimes were negligible. Test cases 
showed that the tool cooperated particularly well with a SAT and BDD based 
property checking multi-engine, because the complexity of those techniques often 
depends on the number of bits occurring in a design. Furthermore, experiments 
revealed that the proposed abstraction seems to be well qualified for hardware 
verification of memories, FIFOs, queues, stacks, bridges and interface protocols. 
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Introduction 

SDLcheck is a verification tool developed to support model checking for asyn- 
chronous (concurrent) programs written in SDL |1I2| . Given an SDL program 
and a specification of a desired behavior of the program, SDLcheck generates 
a verification model that consists of two w-automata, P and T: P models the 
program and T the specification. Then, the automaton language containment, 
L{P) C L(T), is tested by model checking with Cospan |3|. 

The majority of model checking tools designed for asynchronous program 
verification make use of interleaving systems as a model platform. In contrast, 
SDLcheck translates asynchronous SDL programs into synchronous w-automata. 
Concurrent execution (interleaving) of SDL processes is modeled using a simple 
technique described below in the paper. The reason for this choice is in order 
to efficiently combine partial order reduction, which is known to be useful for 
asynchronous programs, with BDD-based symbolic verification, which is known 
to be useful for large synchronous models. For this, SDLcheck implements the 
algorithm described in ^ that realizes partial order reduction through modi- 
fying a program model P prior to model checking. Although model checking 
tools for SDL and other programming and design languages are being inten- 
sively developed in research, in a practical sense, they mostly remain prototypes 
lacking optimizations necessary to cope with large programs. There are several 
advanced model checking tools that mainly relate to hardware verification, where 
synchronous w-automata, on one hand, naturally match synthesizable hardware 
designs and, on the other hand, support symbolic verification. For software veri- 
fication, combining IF p| and SPIN |S|, as reported in |Zj, supports complemen- 
tary sets of model checking optimizations. This combination nonetheless lacks 
symbolic verification, as do all other SDL verification tools of which we are aware. 

SDLcheck is also capable of supporting software/hardware co-design verifi- 
cation. This is realized through Cospan, which is also used as the model checker 
in hardware verification, namely, in the commercial tool FormalCheck"'"'^ . Q 
Through the synchronous w-automaton model platform, SDLcheck combined 
with Cospan supports both software specific and hardware specific model check- 
ing optimizations. 



SDL Subset and Co-design Extensions 

SDLcheck accepts the SDL’96 standard m including ASN.l related features, 
however, without axiomatic data definitions, services and 00 features. It also 
requires the SDL program model to be finite state — so no unbounded recursion. 

^ licensed by Lucent Technologies to Cadence Design Systems. 
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To support CO- verification, SDLcheck implements extensions to SDL (sug- 
gested in 0) that allow description of a software process interfacing to a hard- 
ware module. The hardware part of a co-design is expressed in a hardware de- 
scription language, Verilog or VHDL. On the software side written in SDL, 
SDLcheck supports read/write access to a hardware variable (wire or port) 
through the declaration of an associated interface variable. The interface variable 
is either sourced from, or feeds the hardware variable. SDLcheck also supports 
a none input action. It does not read the process buffer and only triggers a 
transition from the current state of the process when the enabling condition 
which guards this action evaluates to true. A none input action matches well 
the concept of a hardware transition triggered by an event, such as clock rising 
(or falling) or signal reset. Being associated with an interface variable value, 
a hardware event, say, value 1 on wire A.B.y, may be tested in the enabling 
condition and trigger a transition in the corresponding software process. Once 
triggered, this transition executes like a hardware transition: synchronously (si- 
multaneously) with all enabled transitions of the co-design hardware part. Other 
(software) transitions of software processes execute asynchronously according to 
usual SDL rules. 



Verification Technology and the Tool Architecture 

SDLcheck performs three steps: 

1. The compiler sdl2sr translates both an SDL program and a behavior specifi- 
cation into S/R, the input language of Cospan. The specification is expressed 
using macro notations always(x), eventually (x)^ etc. that reflect linear tem- 
poral logic operators and useful combinations of those, with arguments being 
SDL boolean expressions over the program variables. The specification lan- 
guage is similar to that used in FormalCheck"'"^ . In a co-design case, S/R 
code generated by sdl2sr is mechanically concatenated with S/R code pro- 
duced by the FormalCheck"'"^ compiler for hardware modules. 

2. Cospan performs model checking on this S/R code, with any valid combina- 
tion of its options, including symbolic verification and localization reduction. 
If it detects that the program model fails to satisfy the specification, it pro- 
duces an error track demonstrating one of the failure scenarios. 

3. The tool T2sdl extracts from the error track pieces related to the SDL 
program and prints those with back referencing S/R names to SDL sources. 
In a co-design case, the remaining pieces are back referenced to the HDL. 



Translation into S/R u;- Automata 

In S/R, an w-automaton that models an SDL program is described as a syn- 
chronous product of primitive w-automata, each being represented by a distinct 
state variable whose transitions are defined by a single if-then-else constructor: 
asgn a;- > oi ? gi I 02 ? 32 I • ■ • I a„_i ? 3„_i | a„ 

where the omitted guard of the default alternative a„ is true and the complete 
guard for alternative a^, 1 < * < n, is ^gi A . . . ~^gi-i A gi. 

After flattening complicated data (structures, arrays, etc.), each variable of 
an SDL program is translated into a separate state variable. 

The sequentiality of process actions is implemented by designating one state 
variable per process to encode the process control flow graph, say, variable Cq 
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for process Q. It works like a program counter: it is assigned to labels of a 
process’ input states and statements, assuming that all statements have been 
labeled. Transitions between values of Cq mimic the control flow in the process 
Q. The variable Cq is then used in transition guards of other variables owned 
by the process, including its local and shared variables, and buffer. Since, the 
buffer queue is updated by both the owner process and a sender process, a 
buffer transition guard may also test whether the sender process program counter 
points to the corresponding output action. 

In S/R, non-determinism may be captured and controlled using selection 
variables 0 that are assigned to sets of values rather than to distinct values. 
Selection variables do not contribute to the state space. The concurrency (in- 
terleaving) of actions executed by different SDL processes is implemented by 
designating a special selection variable, say, S, which is non-deterministically 
assigned to any one of the SDL program processes: 
asgnS” := {Qi,...Qfc}- 

Then, each normal transition of the program counter Cq is guarded by the 
condition S = Q. If the condition evaluates to false, the program counter Cq 
self-loops at its current point. For example, let the SDL process Q consist of 
only one statement which is a two branch decision (i.e. if-then-else) and variable 
X be assigned, respectively, to yi and y2 in its branches. Then, S/R code for this 
process will have these two assignments: 
asgn Cq - > 

Cthen ? /\{Cq = L start) \ L^ise ? {S = Q)A(^Cq — Lgtart) A^dif \ 

Cstop ‘I (*^ — Q) A {Cq — Lthen V Cq — A tvue | Cq 

asgn X - > yil {S = Q) A {Cq = Lthen) | 2/2 ? (^ = Q) A {Cq = L^ise) \ x 
where dif is the decision condition and Lgtart, Lthen, Leise, Lgtop are labels of 
nodes in the process control flow graph. Thus, the variable S models the inter- 
leaving of the processes Qi,. ■ - Qk and Cq the control flow in the process Q. 
Note the regular structure of the Cq alternatives: in each alternative guard, its 
rightmost conjunction factor expresses the condition under which the process 
control flow (whenever allowed to move by {S = Q)) moves from its current 
point, which is captured by the middle conjunction factor, to its next point, 
which is the alternative’s value. 



Optimizations 

On the top of this method, SDLcheck implements partial order reduction, which 
optimizes model checking by selecting only one of all possible interleavings be- 
tween independent actions provided that others have the same verification effect 
on the behavior specification. This optimization is implemented by modifying 
the original w-automaton model of an SDL program to restrict its transition 
relation. For this, SDLcheck imposes a control over the selection variable S. 
Namely, if an action of process Q may be selected to execute with ignoring other 
possible interleavings, it is marked by the SDLcheck compiler as ample. In the 
optimized model, the selection variable S is forced to be assigned to process 
Q, if the current action of this process is ample. If there are several such pro- 
cesses, only one of them is chosen: this is a deterministic though arbitrary choice, 
made in advance by compiler. Only when no ample actions are enabled, S re- 
mains non-deterministically assigned by the model to any one of the program 
processes {Qi, ■ . ■ Qk}- This technique may significantly reduce the original non- 



SDLcheck: A Model Checking Tool 



381 



determinism in the state space exploration. The objective is to have more ample 
actions. As explained in non-ample actions appear, in particular, because of 
the neccesisty to break global cycles in the state space exploration by allowing 
the complete non-determinism at least at one point in each global cycle. To stat- 
ically deal with this problem, we might mark one action as non-ample in every 
local loop in each process control flow graph. However, SDLcheck performs bet- 
ter. It statically analyzes control flow loops that belong to different processes but 
semantically compensate each other: for example, a loop with output of signal 
z is compensated by a loop (in a different process) with an input action for the 
same signal z. As shown in 0, to break a global cycle that executes along com- 
pensated control flow loops, it is sufficient to have a non-ample action in only 
one of those loops. It is how SDLcheck implements partial order reduction. As 
an option, SDLcheck strengthens this optimization more by forcing to execute 
simultaneously “by a parallel leap” (instead of sequentially) all current actions 
that have been marked ample. 



Applications 

0 reports on verification of a robot control system developed in an UML-like 
graphical notation. The verification has been supported by translating the robot 
control system into an internal representation of SDL used by SDLcheck and then 
applying SDLcheck and Cospan for model checking. SDLcheck is also applied for 
debugging an SDL description of the H.248 gateway control protocol issued by 
ITU-T in 2000. 



References 

1. ITU-T Recommendation Z.lOO (03/93) — Specification and Description Language 
(SDL) , Geneva, 1993. 

2. ITU-T Recommendation Z.lOO (10/96) — Specification and Description Language 
(SDL) , Addendum 1, Geneva, 1996. 

3. R. P. Kurshan, Computer-Aided Verification of Coordinating Processes: The 
Automata- Theoretic Approach, Princeton University Press, 1994. 

4. R. P. Kurshan, V. Levin, M. Minea, D. Peled, and H. Yenigiin. Static Partial 
Order Reduction, Proc. of fth International Conference Tools and Algorithms for 
the Construction and Analysis of Systems, LNCS no. 1384, pp. 345-357, 1998. 

5. M. Bozga, J. C. Fernandez, L. Ghirvu, S. Graf, J. P. Krimm, L. Mounier, J. Sifakis, 
IF: An Intermediate Representation for SDL and its Applications. Proe. of the SDL 
Forum, Montreal, Ganada, 1999. 

6. G. J. Holzmann, The Model Checker Spin, IEEE Trans, on Software Enqineerinq 
Vol. 23, No. 5, 1997. 

7. D. Bosnacki, D. Damm, L. Holenderski, N. Sidorova, Model checking SDL with 
Spin, Proe. of the Tools and Algorithms for the Construction and Analysis of Sys- 
tems, Berlin, Germany, 2000. 

8. Levin, V., E. Bounimova, O. Ba§bugoglu, and K. Inan, A Verifiable Soft- 
ware/Hardware Co-design Using SDL and COSPAN, Proceedings of the COST 2)1 
International Workshop on Applied Formal Methods In System Design. Maribor, 
Slovenia, pp. 6-16, 1996. 

9. N. Sharygina, R. P. Kurshan, J. C. Browne, A Formal Object-oriented Analysis 
for Software Reliability, To appear at FASE 2001. 



EASN: Integrating ASN.l and Model Checking 



Vivek K. Shanbhag^, K. Gopinath*^, Markku Turunen^, 

Ari Ahtiainen^, and Matti Luukkainen^ 

^ CSA Dept, Indian Institute of Science, Bangalore, India 
{ vivek, gopi}@csa. iisc . ernet . in 
^ Nokia Research Center, Helsinki, Finland 
{markku. turunen, ari . ahtiainenjOnokia. com 
® Department of Computer Science, University of Helsinki, Finland 
Matti . LuukkainenOcs .Helsinki . f i 



Abstract. Telecommunication protocol standards have in the past and 
typically still use both an English description of the protocol and an 
ASN.10 specification of the data- model. ASN.l (Abstract Syntax No- 
tation One) is an ITU /ISO data definition language which has been de- 
veloped to describe abstractly the values protocol data units can assume; 
this is of considerable interest for model checking as ASN.l can be used 
to constrain/construct the state space of the protocol accurately. How- 
ever, with current practice, any change to the English description cannot 
easily be checked for consistency while protocols are being developed. In 
this work, we have developed a SPIN-based tool called EASN (Enhanced 
ASN.l) where the behavior can be formally specified through a language 
based upon Promela for control structures but with data models from 
ASN.l. We use the X/Open standard on ASN.l/C-l— I- translation so that 
our tool can be realised with pluggable components. We have used EASN 
to validate a simplified RLC in the W-CDMA (3G GSM) stack. In this 
short papei0, we discuss the EASN language, the tool, and an example 
usage. 



1 Introduction 

Next generation protocols for mobile devices have become very complex and it is 
becoming increasingly difficult for standards bodies to be sure of the correctness 
of protocols during the standardization process. This has become an impediment 
in defining new standards. What one needs is a way of specifying an evolving 
protocol and have some confidence that, at a certain level of abstraction, the 
protocol is consistent inspite of modifications. 



Why ASN.l? There are languages like Promela that can be used, but their 
data structuring capabilities do not match those of ASN.l, for instance, that is 

* Supported by funding from Nokia Research Center, under SID project 99033. 
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widely used in telecommunication protocol specification. It will help the stan- 
dardization process if a model checker could be augmented with ASN.l data 
modeling capabilities to check correctness of interim versions of a protocol be- 
fore establishing a standard. 

ASN.l separates data modeling into abstract and transfer syntax. The ab- 
stract syntax only specifies the universe of abstract values that can be assumed 
by variables in the model without any concern for how they are mapped to a 
particular machine, compiler, OS, etc. Hence from the point of view of model 
checking, an abstract syntax constrains the state space as much as possible IF 
there is a mechanism by which a system state vector can be encoded with exactly 
only the possible values of its constituent substates. The latter is a chief feature 
of the state compaction infrastructure that has been developed for the EASN 
system described here. ASN.l has a subtyping feature with a well developed no- 
tation for expressing constraints. Note that data here actually means the control 
data in the protocols and hence our concerns are different from those approaches 
that exploit symmetry, etc. We derive our EASN tool by marrying ASN.l with 
the well known model checker SPIN. 



Why SPIN? SPINP is an effective model checking tool for asynchronous 
systems, especially designed for communication protocols. Nondeterminism and 
guarded commands in Promela (input language of SPIN) makes it convenient to 
express behavior of communicating protocol entities. SPIN has many capabilities 
for validation of safety and liveness properties 0. Algorithms that effect substan- 
tial space and time savings, like bit-state hashing, on-the- fly0 model-checking 
and partial-order reduction have been incorporated into SPIN. 

SPIN has a simulator that randomly checks only a portion of the state 
space and also a (generated) validator that can attempt to exhaustively check 
the state space of the system or can use techniques like bit-state hashing to check 
a substantial portion of the state space with a fairly high level of assurance. Our 
EASN system also has these components. 



EASN Language. ASN.l can only be used to define the datatypes and constant 
values in an application. Promela, however, is a complete language with a set 
of basic data types and typedef construct to help users compose datatypes, and 
control constructs that are used to define the behavior of protocol entities. 

The EASN Language replaces all the datatyping capabilities of Promela 
with ASN.l. Hence, none of the data types of Promela are retained in EASN, 
except the chan construct. As ASN.l has richer and more expressive datatypes 
compared to Promela, EASN needs to overload the semantics of many of the 
operators of Promela, so as to support a natural set of operations on data. In 
addition, we have also augmented the set of operators as necessary. In brief, 

EASN = Promela - {mtype, typedef, bit, byte, bool, short, int} -|- ASN.l 
-|- overloaded semantics for existing operators -I- few new operators. 
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2 EASN, the Verification Tool 

Encoding State Efficiently: SPIN represents state quite efficiently but, for 
reasons of alignment, allows padding and other extraneous matter in the state 
vector. Since our system uses ASN.l data modules, we require that all variables 
be as constrained as possible in the space of values that they can take through 
the use of subtyping. For instance, if there are only two variables, that are 
constrained between (say) 5.. 7 and 3.. 7, there are only 15 possibilities, and can 
be represented in only 4 bits instead of either 2+3 (5 bits) or worse 3+3 (6 bitsjl. 

Our state compaction infrastructure views the state space of the system as a 
multi-dimensional array (with one dimension for every component of the state), 
and consequently, every state of the system, as a point in this multi-dimensional 
space. We use column-major linearisation. 

SPIN does various kinds of state compaction, and in EASN, we have a com- 
parable mechanism for most of them that perform atleast as well in space. But 
some are unnecessary in EASN. Geldenhuys and VilliersjHl also attempt state 
compression in SPIN along similar lines as ours but by adding a simple construct 
to Promela but with restrictions. For example, different orders of process activa- 
tion along different execution paths are forbidden in their approach as much of 
the state component placement is done statically. The ranges of their variables 
must start at zero. We do not have such restrictions. 



The EASN Tool: SPIN is open sourcejj]. We intend EASN to be open source 
too. NRC has an ASN.l parser that we could use but we did not want to com- 
promise others from using EASN as open source. We, therefore, have used the 
X/Open ASN.1/C++ translator std0 to architect the tool so as to enable other 
users besides us and NRC to realise it by plugging in any compliant ASN.1/C++ 
translator into the system. A block diagram of the EASN system is given in the 
accompanying figure. 

An EASN system specification (for simulation/ verification) consists of two 
compilation units. One containing all the ASN.l modules (the dEASN spec.) 
that is parsed by the ASN.1/C++ translator to generate C++ source, and the 
other containing the behavioral specification of the protocol entities (the cEASN 
spec.) that is parsed by the EASN parser (a modified Promela parser, derived 
from SPIN). It is the variable declarations in the cEASN spec that ties it to the 
dEASN spec as their types are defined in the ASN.l modules. The EASN parser 
imports all the relevant information regarding a type, from the generated C-| — h 
source, by querying its meta-data interface. 

The Parser and Simulator: The executable that can parse and simulate a 
given cEASN spec, is fetched from linking the C++ generated by the translator 
(corresponding to the associated dEASN spec.) along with all the (appropriately 
modified) SPIN modules. This executable is the EASN tool. The EASN simulator 
requires to access data values and modify them through permitted operations. 

^ Experienced ASN.l users may note that such an encoding is even better than the 
often very compact PER encoding. 
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Rectangles in this figure refer to executables. 

Elipses refer to source code, either manually developed, 
or automatically generated. 

Rounded rectangles refer to s/w modules at a coarse 
grain level of abstraction. 

The hexagon contains data-structures internal to an application 

dEASN: The ASN.l Data modules for a EASN Spec. 
cEASN: The control aspect (behaviour of protocol entities.) 

SPIN; contains three main modules, the Parser, Simulator & 
the pan-Generator. LTL-translation & GUI are the other two 
modules of SPIN that we inherit into EASN without modification. 

EASN: Given an EASN spec., the ASN.1/C++ translator is first 
invoked to generate the C++ sources that are compiled into & with 
the rest of the (appropriately modified versions of) SPIN Sources, 
to fetch the EASN executable, that can then parse the cEASN Spec. 

EASN-pan: The EASN-generated pan.[thmcb] files should 
be compiled & linked with the sources generated by the Translator, 
and the generated Compaction-Information, and one additional 
State-Vector Compnent module, to fetch the Protocol- ANalyser. 



However, since the simulator engine has no knowledge of the specific ASN.l 
types that might be used in different EASN specifications, these data operations 
must be carried out using the ASN.1/C++ Generic Data Interface that supports 
operations on objects conforming to the ASN.l data-model. 



The Generated Validator: SPIN generates C code that is compiled to obtain 
the validator. EASN, however, generates C++ code that has to be linked with 
the code generated by the ASN.1/C++ translator, the code generated by the 
Compaction information generator and the compaction infrastructure to obtain 
the validator. 

The compaction information is a set of C++ functions that export all infor- 
mation about value-constraints expressed in the original ASN.l spec, through 
the C++ interface as required by the generic compaction infrastructure module. 
Through these two additional components of our framework, we implement in- 
crementally the computation of the linearised representation of the state of the 
system that needs to be stored into the hash-table, and also the hash-value of 
the bucket in the table. 



RLC/ABP Examples: We have used EASN to validate a simplified RLC in 
the W-CDMA (3G GSM) stack. It uses less memory but more time than SPIN. 
Further details of the performance of EASN have been submitted to the FMICS 
workshop. Due to its length, we present a much simpler ABP protocol in figure 
2. Note that the state vector for EASN is half the size of SPIN’S. 



Correctness of Implementation vis-a-vis SPIN: In crafting EASN from 
SPIN, we identified the following invariant that could be a necessary and suffi- 
cient condition to convince oneself that our implementation is sane: 

Given a Promela spec, s and a cEASN spec, e, derived from s by changing 
its variable types to equivalent ASN.l types (defined in an ASN.l module 
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appropriately imported into EASN): A. Simulation runs of SPIN over s 
and of EASN over e should select identical sequence of state-transitions, 
for the same seed value; B. The sequence in which the reachable states of 
the system are visited by the generated validators (by SPIN for s and by 
EASN for e) must be identical (for exhaustive searches), with/without 
partial-order reduction or never-claims. 

EASN preserves this invariant for all the tests that we have tried so far. 
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/* mtype = { msgO, msgl, ackO, ackl }; */ 


13 


inline phase(msg, good_ack, bad_ack) { 


25 


active proctype Sender() { 


2 




14 


do 


26 


do 


3 


chan sender = [1] of { asn::MtypeAbp }; 


15 


; : sender ?good_ack -> break 


27 


phase(msgl, ackl, ackO); 


4 


chan receiver = [1] of { asn::MtypeAbp }; 


16 


sender ?bad_ack 


28 


phase(msg0, ackO, ackl) 


5 




17 


timeout -> 


29 


od 


6 


inline recv(cur_msg, cur_ack, lst_msg, lst_ack) { 


18 


if 


30 


} 


7 


do 


19 


receiverlmsg; 


31 


active proctype Receiveif) { 


8 


:: receiver?cur_msg -> sender!cur_ack; break 


20 


skip /* lose message */ 


32 


do 


9 


:: receiver?lst_msg -> sender!lst_ack 


21 


fi; 


33 


:: recv(msgl, ackl, msgO, ackO); 


10 


od; 


22 


od 


34 


recv(msg0, ackO, msgl, ackl) 


11 

12 


) 


23 

24 


) 


35 

36 


Q) 



1 Easn DEFINITIONS ::= 






2 BEGIN 


State-vector 24 byte, depth reached 9, errors: 0 


State-vector 12 byte, depth reached 9, errors: 0 


5 


12 states, stored 


12 states, stored 


f ... 


3 states, matched 


3 states, matched 


5 MtypeAbp :;= ENUMERATED { 


15 transitions (= stored+matched) 


15 transitions (= stored+matched) 


5 msgO, msgl, ackO, ackl 


0 atomic steps 


0 atomic steps 


7 } 


hash conflicts; 0 (resolved) 


hash conflicts; 0 (resolved) 


) END 


(max size 2''18 states) 


(max size 2'' 18 states) 



1; The cEASN Spec. 2: The dEASN Spec. The original Promela Spec, can be recovered from 

3: The SPIN-pan output. 4: The EASN-pan output. the cEASN Spec, by uncommenting line # 1. 



Fig. 1. ABP in SPIN and EASN. 
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1 Introduction 

Model checking pnj is an automated procedure for determining whether a fi- 
nite state program satisfies a temporal property. Model checking tools, due to the 
complex nature of the specification methods, are used most effectively by verifi- 
cation experts. In order to make these tools more accessible to non-expert users, 
who may not be familiar with these formal notations, we need to make model 
checkers easier to use. Visually intuitive specification methods may provide an 
alternative way to specify temporal behavior. 

One such visual notation that is already widely used in industrial practice to 
specify the timing behavior of hardware systems is timing diagrams. Synchronous 
Regular Timing Diagrams (SRTDs) ^ are a class of timing diagrams that cor- 
respond to regular languages. SRTDs are a very effective formal specification 
notation since (1) they have a simple syntax and semantics that corresponds to 
common usage, and (2) there are efficient linear-time model checking algorithms 
P for SRTDs. 

Compositional reasoning ameliorates the state explosion problem by reduc- 
ing reasoning about the entire system to reasoning about individual components. 
One flavor of compositional reasoning is assume-guarantee reasoning where each 
component guarantees certain properties based on assumptions about other com- 
ponents. There are several difficulties in applying assume-guarantee reasoning: 
firstly, decomposing the specification is essential, and secondly, auxiliary asser- 
tions are often necessary. These tasks require a non-trivial amount of manual 
effort. The decompositional nature of SRTDs, however, makes it possible to do 
assume-guarantee style compositional reasoning P in an efficient and fully au- 
tomated manner. 

The Regular Timing Diagram Translator (Rtdt) tool provides a user-friendly 
graphical editor, that is used to create and edit SRTDs, plus a translator that im- 
plements the compositional and non-compositional model checking algorithms. 
Rtdt forms a formal and efficient timing diagram interface to the model checker 
COSPAN 

*** Supported in part by NSF 980-4736 and TARP 003658-0650-1999. 
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2 Synchronous Regular Timing Diagrams 

An SRTD is specified by describing a number of waveforms over a number of 
clock cycles. The clock is depicted as a special waveform that is defined over 
{0,1} where the value toggles at consecutive points. In SRTDs, any change in 
the signal value must occur at either the rising edge or falling edge of the clock 
waveform. 



precondition marker 




Fig. 1. Annotated Synchronous Regular Timing Diagram. 

A waveform at any point may be either 0 (low), 1 (high), or one of two don’t 
cares. The don’t care value specifies that the value at that point is unimportant 
and can be either 0 or 1. The don’t care transition specifies that the value of 
the signal changes exactly once and remains stable for the remainder of the 
specified interval. A pause specifies that all the signals, except the clock, remain 
unchanged for an arbitrary but finite period of time until a definite change in 
value of at least one waveform indicates the end of the pause. 

The waveforms are partitioned into an initial precondition part and the fol- 
lowing postcondition part. In P| it is shown that we can construct regular ex- 
pressions for the precondition Tp^e and the postcondition Tpo^t of an SRTD T. 
An infinite computation cr satisfies an SRTD T (written cr ^ T) if and only 
if any finite sub-computation that satisfies Tpre is immediately followed by a 
sub-computation that satisfies Tpost- 

3 The Rtdt Tool 

The main features of the Rtdt tool are described below. 

— Rtdt has a user friendly editor for graphically creating and editing SRTDs. 

— Non-compositional verification - The translation algorithm generates an ui- 
NFA for the complement of the SRTD. This to-NFA can be used as the 
property in the automata theoretic approach to model checking, resulting in 
a model checking procedure that is linear both in the size of the system and 
the SRTD specification (see ^ for details). 
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— Assume-guarantee reasoning - An SRTD can be partitioned into bundles of 
waveforms called fragments such that each fragment contains all the wave- 
forms controlled by an implementation module. The translation algorithm, 
with a minor modification, is used to generate an lu-NFA for each such frag- 
ment. There is also an algorithm to automatically generate auxiliary pro- 
cesses from an SRTD such that the parallel composition of these processes 
generates the language of the SRTD (see |5| for details). These algorithms 
can be used, in a fully automated way, with an assume-guarantee proof rule 
3, that is sound and complete for both safety and liveness properties. The 
model checking process is very efficient, linear in the size of the system and 
the diagram. 

— The user can execute COSPAN from within Rtdt. When a verification check 
fails, Rtdt displays the resulting error trace as an SRTD and allows the 
option of editing this diagram. 



4 Case Studies 

Rtdt has been used with COSPAN to verify timing diagram properties of a 
number of interesting examples, such as a memory access controller and Lucent’s 
PCI Interface Core. Rtdt was used to automatically generate the oj-NFA for 
complement of the SRTD property and the auxiliary processes. COSPAN was 
used to discharge the proof obligations in the assume-guarantee proof rule. 

The verification checks were done compositionally and non-compositionally. 
We observed significant reductions in BDD size, space and time required. In the 
memory access controller example, we saw a savings of 21% to 69% in BDD size. 
For the PCI Interface Core, we formulated the SRTD properties from the actual 
diagrams found in the PCI Local Bus specification H21. The PCI interface core 
yielded more dramatic results; we observed a reduction in BDD size of 41% up 
to 84%. Some non-compositional verification checks failed to complete due to a 
shortage of memory but all the compositional checks completed successfully. 

5 Conclusions and Related Work 

Various researchers have investigated the use of timing diagrams in formal veri- 
fication. S ACRES |4|f)j is a verification environment for embedded systems that 
allows users to graphically specify properties as Symbolic Timing Diagrams 
(STDs) 0. The monolithic translation algorithms for STDs may be exponen- 
tial. In later work (cf. P5), a compositional verification methodology is used to 
verify STD properties. This work uses timing diagrams as a convenient nota- 
tion for expressing temporal properties, while the assume-guarantee reasoning 
is left to the verifier. Fisler 0 provides a procedure to decide regular language 
containment of non-regular timing diagrams, but the model checking algorithms 
have a high complexity (PSPACE). They |2| have implemented a monolithic 
translation algorithm that compiles a regular subset of these diagrams into lu- 
automata. Unlike our work, however, they do not address temporal ambiguity. 
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Another approach 0 uses Presburger formulas to determine whether the delays 
and guarantees of an implementation satisfy constraints specified as a timing 
diagram. The algorithm for verifying Presburger formulas is multi-exponential. 

We have outlined the key features of the tool Rtdt, which is based on a visual 
specification formalism called Synchronous Regular Timing Diagrams (SRTDs) 
p. Rtdt consists of an editor that allows a user to graphically create and edit 
an SRTD. The tool implements an efficient model checking algorithm that is 
linear in both the size of the system and the SRTD specification. Rtdt also 
implements a sound and complete assume-guarantee proof rule |2| that can be 
applied to SRTDs in a fully automated way. Rtdt will be integrated into an 
upcoming release of the industrial verification tool FormalCheck. 
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The correct behavior of real-time applications depends not only on the cor- 
rectness of the results of computations but also on the times at which these 
results are produced. As a matter of fact, violations of real-time constraints in 
embedded systems are the most difficult errors to detect, because they are ex- 
tremely sensitive both to the patterns of external events stimulating the system 
and to the timing behavior of the system itself. Clearly, the development of real- 
time systems requires rigorous methods and tools to reduce development costs 
and ’’time-to-market” while guaranteeing the quality of the produced code (in 
particular, respect of the temporal constraints). 

The above requirements motivated the development of the Taxys tool, ded- 
icated to the design and validation of real-time telecommunications software. 
One of the major goal of the Taxys tool is to produce a formal model that 
captures the temporal behavior of the whole application which is composed of 
the embedded computer and its external environment. For this purpose we use 
the formal model of timed automata 0. The choice of this model allows the 
use of results, algorithms and tools available. Here, we use the KRONOS model 
checker ^ for model analysis. 

From the source code of the application, an Esterel program annotated 
with temporal constraints, the Taxys tool produces on one hand a sequential 
executable code and on the other hand a timed model of the application. This 
model is again composed with a timed model of the external environment in 
order to obtain a global model which is statically analyzed to validate timing 
constraints. This validation should notably shorten design time by limiting te- 
dious test and simulation sessions. 

1 Taxys 

The objective of the Taxys project is to propose a framework for developing real- 
time embedded code and verifying its correct behavior with respect to quantitative 
timing constraints. 

* This work is supported by the RNRT project TAXYS and the ITEA-DESS project 
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We use Esterel as development language of the application. This lan- 
guage provides powerful constructs for management of parallelism and excep- 
tions. It has rigorously defined semantics. Esterel programs run in a single 
thread on a single processor with a non-preemptive interrupt routine and can 
refer to external data and routines written in C for complex (numerical) com- 
putations. Thus, the application is decomposed into a control part, written in 
Esterel and a functional part written in C, and it is compiled with the Es- 
terel compiler Saxo-RT |^. 

The use of synchronous languages for the development of real-time reactive 
applications relies on a ’’synchrony assumption” meaning that the application 
reacts infinitely fast with respect to its environment. This assumption, very 
convenient in practice, must be validated for a given implementation on a target 
machine. In practice, validating the synchrony assumption amounts to show 
that the environment does not take too much lead over the application. This 
requires the use of a ’’realistic” synchrony assumption strongly depending on 
the application, on the speed of the machine and on its interactions with the 
environment. To interface the real-time system with its environment, we use an 
external event-handler generated by Saxo-RT from an ad-hoc specification 

and which precisely takes into account the way external events are captured 
by the interrupt mechanisms and sent to the application. 

The behavior of such systems can be modelled by the composition of 3 sys- 
tems represented as automata : the application automaton A, the external event 
handler Ti, which abstracts the behavior of the interrupt routine and buffers 
external events before they are taken into account by the next synchronous re- 
action, and the environment model £ which specifies the scenarios in which the 
application must run PJ. 

The environment of a real-time embedded system can exhibit different be- 
haviors that must be captured by some non-deterministic model. As Esterel 
programs are deterministic, we add a non-deterministic instruction npause to the 
Esterel language. The environment can thus be written in the same language 
as the application. The timing constraints are specified directly by pragmas in 
the Esterel code of A and £. 

Taxys design flow is shown in Fig. Q1 Saxo-RT generates three C-modules 
which compute A, TL and £ transition functions : the model of the application 
contains the embedded code itself. Kronos P| explores the system states space 
by composing on-the-fly A, TL and £. Thus, no intermediate state explosion 
occurs before composition and only reachable states are computed. If any timing 
constraint is violated, a trace leading to this error is generated. This trace is then 
re-executed step by step on the Saxo-RT graphical debugger to provide to the 
user more precise diagnostics. 

2 Timing Analysis 

We make the following assumption on the temporal behavior of the application: 
execution time is spent in the functional part to compute C-functions which 
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Fig. 1. Taxys Design Flow. 



have been previously instrumented by profiling. The Esterel code is annotated 
with this information. This hypothesis is true for many reactive applications if 
the embedded code has been compiled efficiently [S|. We then specify two kinds 
of real-time constraints : throughput and deadline constraints. A throughput 
constraint is a global constraint and expresses the fact that the system reacts fast 
enough for a given environment model. The violation of a throughput constraint 
corresponds to an overfiow of A deadline constraint is “local” and expresses 
for example, a maximum delay between a given input and a given output of the 
system. 

This approach is illustrated by the toy example “pulse” on Fig. |3 which is 
composed of two parallel tasks. The first, triggered by input A, calls filter F. 
The second, triggered by B, computes some correction G on an actuator using 
result of function F. F (resp. G) consumes between Fmin (resp. Grain) and 
Fmax (resp. Gmax) CPU time. The buffer size of the external event handler Ji 
is 1. 

The throughput constraint is specified by the environment model written in 
timed Esterel (Fig. E|). It is composed of two independent periodic tasks, the 
first one strictly periodic with a period Ta and the second one with a period Tb 
jittered by an interval [0,e], for some constant e. 

There are two deadlines constraints on function F and G (Fig.0) : (T>i) F 
must terminate dl time units after arrival of event A and (T> 2 ) G must compute 
value of actuator with data not older than d2 time units i.e., G terminates at 
most d2 time units after the arrival of the last event A which was consumed by 
function F. The annotated application code is given on Fig.0 : Vi is specified by 
the pragma 0 < clock(lastA) < dl, and T >2 by the two pragmas Y = clock{lastA) 
(which starts a new clock each time F is executed), and 0 < T < d2. 
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Fig. 2. The “pulse” Example, 
loop 

await A ;"/{# Y=clock(last A) "/} 
call F ();•/.{# Fmin<CPU<Fmax "/.} 
•/.{# 0<clock(last A)<dl "/.} 
end loop 

loop 

await B ; 

call GO ;•/.{# Gmin<CPU<Gmax "/.} 
•/.{# 0<Y <d2 •/.} 
end loop 

Fig. 4. Application Code. 



Fig. 3. Deadline Constraints, 
loop 

npause ;"/,{# TA<ca< TA; ca:=0} 
emit A ; 
end loop 

loop 

npause ;"/,{# TB<cb<TB+e; cb: =0} 
emit B ; 
end loop 



Fig. 5. Environment Code. 



3 Experimental Results 



We used Taxys for verifying the Esterel code for the communication mode of 
a GSM terminal developed by Alcatel (815 Esterel lines and 48000 C lines). 
We found 4 scenarios leading to deadline violations caused by a wrong scheduling 
between two C-functions PJ. 

We present here results obtained on a digital phone prototype carrying si- 
multaneously voice and data produced by a graphic tablet, implemented on a 
32 MIPS Digital Signal Processor. Audio data are processed at 8kHz and their 
processing consumes 3900 CPU cycles over the 4000 CPU cycles available every 
125/rs. Graphic tablet data are compressed by a vectorization algorithm which 
consumes sporadically between 15000 and 20000 CPU cycles. 6 experiments were 
carried out with the same Esterel code for the application but with different 
environment models and handler buffer sizes. ISDNi and ISDN 2 with an en- 
vironment model composed of two strictly periodic and independent tasks (the 
first carrying audio data at 8kHz and the second the graphic tablet data at 
lOOHz). ISDN 3 and ISDN 4 with the second task being aperiodic and emitting 
bursts at rates varying in a non-deterministic manner between 25 and lOOHz. 
ISDN 5 and ISDNe with a third additional periodic task modelling switching 
between several audio modes. In all cases, the application A consists of 3000 C 
lines and 258 Esterel lines, and the environment £ of 120 Esterel lines. 
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Results presented in table 0 show that a buffer size of at least 6 is neces- 
sary for absorbing the sporadic task. We observe that the number of symbolic 
states explored by Kronos increases exponentially with the “degree” of non- 
determinism of the environment. Therefore, to cope with state explosion due to 
environment non-determinism, it is necessary to find appropriate environment 
model approximations preserving the verified properties. 



Table 1. Taxys Experimental Results. 



Name 


Buff, size 


Symb. states 


Verif. time 


Diagnostic 


ISDNi 


5 


2 200 


1.27 s 


buffer overflow 


ISDN 2 


6 


10 849 


5 s 


OK 


ISDN 3 


5 


15 894 


6.29 s 


buffer overflow 


ISDN 4 


6 


633472 


10 mn 47 s 


OK 


ISDN 5 


5 


22 695 


13.6 s 


buffer overflow 


ISDNe 


6 


> lO’’ 


7 


aborted 



4 Conclusion 

We have presented an original approach for specifying, designing and validat- 
ing real-time embedded systems. This approach is implemented in an entirely 
automated tool applicable to industrial size examples. Specifications are writ- 
ten in a user friendly and compositional formalism which does not require from 
the user any knowledge about timed automata or temporal logic. Its limitations 
are mainly those of model-checking techniques. Any advance in these techniques 
can be taken into account, transparently for the user. Furthermore, because the 
embedded code is effectively executed during validation, the validation is trust- 
worthy and is therefore particularly suited to safety critical applications. 
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Abstract. Compositional model checking is used to verify a processor 
microarchitecture containing most of the features of a modern micropro- 
cessor, including branch prediction, speculative execution, out-of-order 
execution and a load-store buffer supporting re-ordering and load for- 
warding. We observe that the proof methodology scales well, in that the 
incremental proof cost of each feature is low. The proof is also quite 
concise with respect to proofs of similar microarchitecture models using 
other methods. 



1 Introduction 

Compositional model checking methods reduce the proof of a complex system, 
through decomposition and abstraction, to a set of lemmas that can be ver- 
ified by a model checker. It has been shown that the proof of systems with 
unbounded or infinite state can be reduced to tractable model checking prob- 
lems on finite state abstractions. For example, an instruction processing unit 
using Tomasulo’s algorithm m was proved using the method |McMflfl| for 
unbounded resources. The proof was substantially simpler than that of a similar 
model using a general purpose theorem prover EEnn. The safety proof involved 
just three simple lemmas verified by a model checker. The relative simplicity of 
the proof using compositional model checking owed principally to the lack of user 
generated inductive invariants and the lesser need for manual proof guidance. 
Nonetheless, the important question of the scalability of the method remains 
open. That is, does the manual proof effort increase in reasonable proportion to 
the size and complexity of a system? 

We approach this question by considering the verification of a complete pro- 
cessor microarchitecture, containing most of the important features of a modern 
microprocessor. These include branch prediction, speculative execution, out-of- 
order execution (with in-order retirement and clean exceptions) and a load-store 
buffer supporting re-ordering and load forwarding. The question is whether the 
complexity of the proof increases by some reasonable increment with each new 

* Supported by SRC contract 99-TJ-683.003, AFOSR MURI grant F49620-00- 1-0327, 
NSF Theory grant CCR-9988172 
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architectural feature, or whether it increases intractably, making proofs of com- 
plex systems impractical. We find that the incremental proof cost of each archi- 
tectural feature is small (just a few additional lemmas) and that the interaction 
of these features, though complex, does not make the proof expand intractably. 

The microarchitecture model that we verify is similar in its feature set to 
models that have been verified using theorem proving methods |FI(fSnfliSH98| . 
We compare our proof to the proofs obtained by these methods, with emphasis 
on the use of inductive invariants and its effect on proof complexity. 

Section El provides a brief overview of the proof method. Then section E| de- 
scribes the microarchitecture model that we verified, and its specification. In 
section 0 we discuss the proof, and consider the question of scalability. Sec- 
tion 0 compares the proof with proofs obtained previously for similar microar- 
chitectures. In section we conclude with some remarks on the strengths and 
weaknesses of the method, and how the weaknesses might be addressed. 



2 Background 

To verify the microarchitecture, we use the SMV proof assistant jMcMOdj . This 
tool supports the reduction of correctness conditions for unbounded or infinite- 
state systems to lemmas that can be verified by model checking. The general 
approach is to divide the intended computation into “units of work” that use 
only finite resources in the implementation, such as instructions in a processor, 
or packets in a packet router. Correctness of a given unit of work is then reduced 
to a finite state problem using a built-in collection of abstract interpretations. 
In effect, we disregard those components of the system state not involved in the 
given unit of work. Because specifications can be temporal, we avoid the need 
to write and verify an inductive invariant of the system. Instead, we exploit the 
model checker’s ability to compute the reachable states (strongest invariant) of 
the abstract models. This greatly simplifies the proofs. 

The Proof Methodology. A system is specified with respect to a reference 
model. For a processor, this is an “instruction set architecture” (ISA) model that 
executes one instruction at a time in program order. The correctness condition is 
a temporal property relating executions of the implementation to executions of 
the reference model. We decompose correctness into “units of work” by specifying 
refinement relations. These are temporal properties specifying the data values 
at internal points in the implementation in terms of the reference model. For 
example, in a processor we may specify the operands read from the register file 
and the results computed by the ALU. To make such specifications possible, we 
may add auxiliary state variables that record the correct data values as they 
are computed by the reference model. A definitional mechanism in the proof 
assistant allows us to add auxiliary variables in a sound manner. 

Mutually Inductive Temporal Proofs. The refinement relations are then 
proved by mutual induction over time. Each refinement relation is a temporal 
property of the form Gcj), meaning that f is true at all times t. To prove that f is 
true at time t, we may assume by induction that the other refinement relations 
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hold for all times less than t. This is useful in a methodology based on model 
checking, because the notion that q up to time t — 1 implies p at time t can be 
expressed in temporal logic as^{qU ~^p). Hence, this proposition can be checked 
by a model checkei'0 This mutually inductive approach is important to the proof 
decomposition. It allows us to assume, for example, when proving correctness 
of an instruction’s source operand, that the results of all earlier instructions 
have been correct. Note that this is quite different from the method of proof 
by invariant, in which we show that some state property at time t — 1 implies 
itself at t. Here the properties are temporal, and the inductive hypotheses are 
assumed for all times less than t, and not just at t — 1. This is important, since 
it allows us to avoid writing auxiliary invariants. 

Temporal Case Splitting. Next we specialize the properties we wish to prove, 
so that they depend on only a finite part of the overall state. For example, sup- 
pose there is a state variable v, which is read and written by processes pi . . .pn- 
We wish to prove a property G(j> of v. We add an auxiliary state variable w 
which points to the most recent writer of variable v. Now, suppose we can prove 
for all process indices i that G{{w = i) ^ (p). That is, p holds whenever the 
most recent writer is pi. Then Gp must hold, since at all times w must have 
some value. We call this “splitting cases” on the variable w, since it generates a 
parameterized property with one instance for each value of w. For a given value 
of i, we may now be able to abstract away all processes except pi, since the case 
w = i depends directly only on process pi. 

Abstract Interpretation. Finally, we wish to reduce the verification of each 
parameterized property to a set of tractable model checking problems. The diffi- 
culty is that there may be variables in the model with large or unbounded ranges 
(such as memory addresses) and arrays with a large or unbounded number of 
elements (such as memory arrays). We solve this problem by using abstract in- 
terpretation to reduce each data type to a small number of abstract values. For 
example, suppose we have a property with a parameter i ranging over memory 
addresses. We reduce the type A of memory addresses to a set containing two 
values: the parameter value i, and a symbol A\i representing all values other 
than i. In the abstract interpretation, accessing an array at location i will pro- 
duce the value of that location, whereas accessing the array at A \ i produces T, 
a symbol representing an unknown value. 

In effect, for each time the user “splits cases” on a variable of a given type, 
there is one value in the abstract type and one element in each abstracted array 
indexed by that type. If there are two parameters i and j of type A, the proof 
assistant may split the problem into two cases: one where i = j and one where 
i ^ j. Alternatively, it may consider separately the cases i < j, i = j and i > j, 
if information about the order of these values is important to the property. 

The abstractions used by the proof assistant are sound, in the sense that 
validity of a formula in the abstract interpretation implies validity in the concrete 
model for all valuations of the parameters. Of course, the abstraction may be too 

^ In some cases we can also assume that another refinement relation holds for all times 
less than or equal to t, provided we do not do this in a circular way. 
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coarse to verify the given property (z.e., the truth value in the abstract model 
may be _L) even though the property is true. Note, however that the user does 
not need to verify the correctness of the abstraction, since this is drawn from a 
fixed set built into the proof assistant. 

The proof process proceeds as followings. First, the user specifies refinement 
relations (and other lemmas, as necessary), which are proved by mutual temporal 
induction. These properties are parameterized by “splitting cases” on appropri- 
ate variables, so that any particular case depends on only a finite part of the 
system state. Finally, the proof assistant abstracts the model relative to the 
parameter values, reducing the types with large or unbounded ranges to small 
finite sets. The resulting proof obligations are discharged by a model checker. 

We now consider how this methodology can be applied to processor microar- 
chitectures with features such as speculative execution, out-of-order execution 
and load-store buffers. 

3 The Processor Model 

The processor microarchitecture that we model has out-of-order, speculative 
execution using a variant of Tomasulo’s algorithm with a reorder buffer. It 
implements branch prediction and precise exceptions, and has an out-of-order 
load-store buffer with load forwarding. For simplicity, we separate program and 
data memories. The model is generic, in that many functions, such as the ALU 
(arithmetic-logic unit) and the instruction decoder have been replaced by unin- 
terpreted function symbols. A specific ISA may be implemented by defining these 
functions appropriately. Our proof, however, is independent of these functions. 

3.1 The Specification 

The microarchitecture is specified with respect to a reference model, which exe- 
cutes one instruction per step in program order. The ISA consists of the following 
instruction classes. A load (LD) takes two register operands, source address and 
destination. It reads data memory at the source address, and loads the value into 
the destination register. A store (ST) takes two register operands, the source and 
the destination address. It stores the source value at the destination address in 
data memory. An ALU operation (ALU) takes two register operands and a des- 
tination register. This generic instruction models all the instructions using the 
ALU by a single uninterpreted function. Although we do not explicitly model im- 
mediate operands, these can be folded into the generic ALU function. A branch 
(BC) performs a test on its two register operands. If true, it sets the program 
counter to the branch target value. Both the test and the branch target compu- 
tation are modeled by uninterpreted functions. A jump (JMP) sets the program 
counter to the address in the source register. This is to implement non-local 
jumps such as returns from exception handlers. Finally, an output operation 
(OUT) sends its register operand to the processor’s output port. The LD, ST 
and ALU operations can cause an exception to be raised, in which case control 
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is transferred to the exception handler address. Asynchronous interrupts are not 
modeled. 



3.2 The Implementation Model 

The microarchitecture is depicted in figure O It is out-of-order, in that instruc- 
tions are executed when their operands are available, not necessarily in program 
order. Instruction execution begins by fetching the instruction from program 




Fig. 1. Microarchitecture . 



memory at the program counter address (PC). The instruction is then decoded 
to determine the operation type, the operand registers, the branch target, etc. 
The program counter is updated by incrementing its current value. Since the 
increment depends on the instruction width, we model incrementation by an 
uninterpreted function. In case of a conditional branch, however, the branch 
predictor guesses the value of the branch condition. Thus we continue fetching 
instructions even though the actual branch condition is not yet known, at the 
risk of having to cancel the ensuing instructions if the guess is incorrect. If the 
predicted branch condition is true, the PC is loaded from the branch target. 
Since branch predictions do not affect correctness, the branch predictor is mod- 
eled as a non-deterministic choice, though this can be replaced by any desired 
function. 

The instruction then reads its source operands from the register file, and 
is loaded into the next available reservation station (RS) to await execution. A 
source register may contain an actual data value, or it may contain a tag, pointing 
to the RS that will produce the data value when it completes. In the case of a tag, 
the RS must wait until the corresponding data value returns on the result bus 
(RES). When both operand values are available, the instruction may be issued 
to an execution unit. When the result of the operation is computed, it returns on 
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the result bus, with its tag, and may be forwarded to any instructions holding 
that tag. The result is stored in the reorder buffer (RB) until the instruction 
retires. At retirement, the result is written to the register file. Instructions are 
retired in program order, so that the state of the register file is always consistent. 
This allows clean recovery from exceptions or mispredicted branches. 

When a branch instruction retires, we compare the computed value of the 
branch condition to the predicted value. If these are not the same, subsequent 
instructions may have been fetched from an incorrect program counter. Thus, 
they must be flushed. When this happens, the program counter is set to the 
alternative that was not chosen at fetch time. 

Load and store operations are recorded in a load- store buffer (LSQ) in pro- 
gram order. In our model, this buffer is unbounded, however it could be refined 
by any fixed size buffer. Loads and stores are not necessarily executed in program 
order. A load operation may execute after it has issued {i.e., its operands have 
been obtained) and after all earlier stores to the same address have executed. 
Alternatively, a load instruction may execute by forwarding the data value from 
the most recent store to that address, even if that store has not yet executed. A 
store instruction can execute after it has issued, and after all previous loads and 
stores to the same address have executed^ 

The above conditions avoid the classic hazard conditions (RAW, WAR and 
WAW), guaranteeing correct operation even when operations occur out of pro- 
gram order. In addition, we must ensure that a store cannot execute until the 
instruction has actually retired, since the store cannot be undone if the instruc- 
tion were to be flushed. When a store instruction retires, it is marked eommitted 
in the load-store buffer, and cannot subsequently be flushed. The choice of which 
available operation to execute is non-deterministic, though this could be replaced 
by any desired scheduling policy. 



4 Verification 



Our correctness criterion is that the sequence of output values produced by 
the reference model and the microarchitecture model should be the same, for 
corresponding initial states. The reference model chooses non-deterministically 
at each time whether to take a step. By witnessing this choice, we align the 
reference model’s operation temporally with that of the implementation. 

The two most interesting aspects of the proof deal with speculative execution 
and with partially ordered operations, such as register reads/ writes or memory 
loads/stores. We introduce proof decompositions to handle these situations, us- 
ing compositional model checking 0 



^ Note this implies that the actual address operands of all earlier stores (and loads) 
must be known before a load (store) can execute. 

® Proof and prover may be found at http : //www-cad. eecs .berkeley . edu/~kenmcmil 
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4.1 Specifying Refinement Relations 

Our basic approach is to decompose the proof into “units of work” , in this case 
instructions. We prove correctness of a single instruction, relative to the refer- 
ence model, given that all earlier instructions execute correctly. To reduce the 
verification complexity, we may further decompose the instruction into smaller 
steps, such as operand read, result computation, memory load, etc. We then 
write refinement relations, specifying the data values at various points in the 
implementation, in terms of the reference model. 

Of course, to specify data items in the implementation, we must determine 
their correct values. This is done by defining auxiliary variables that record the 
correct data values as computed by the reference model. For example, when an 
instruction is fetched, the reference model executes it atomically, computing the 
correct operand and result values. The instruction is then stored in an RS. We 
record the correct operands and result for that RS. For example, here is the 
SMV code that does this: 

if{^stallout A iopin in {ALU,LD,ST,BC}){ 
ne:s.t{aux[st-choice].opra) := opra; 
next{aux[st-choice].oprb) := oprb; 
nex.t{aux[st-choice].res) := res;} 

Here, st-choice is the index of the reservation station, and opra, oprb and res 
are values from the reference model. We now specify that, when the reservation 
station holds an operand value, it is equal to the stored correct value in the aux 
structure (and similarly for result values). 

To do this, we must take into account speculative execution. That is, if an 
instruction occurs after an exception or a mispredicted branch, we say it is shad- 
owed. A shadowed instruction does not correspond to any instruction executed 
by the reference model. Thus we cannot specify its correct operand and result 
values. In fact, these values are spurious, and must never affect the register file 
or memory. To write refinement relations, we must know whether an instruction 
in the implementation is shadowed. Fortunately, this is easy to determine. We 
set an auxiliary state bit shadow when the predicted branch condition differs 
from the correct branch condition, or when an exception occurs. The shadow bit 
is cleared when a flush occurs. Here is the SMV description: 

init(shadow) := 0; 

next(shadow) := -^flush A {shadow V 

-^stallarch A ( exnjraised V {opin = BC A taken yf itaken))); 

Here, taken is the correct branch condition (from the reference model) and itaken 
is the predicted branch condition. Now, any instruction fetched while shadow is 
true is marked shadowed, by setting the auxiliary bit aux [st-choice]. shadow. 
While shadow is set, we stall the reference model, since no valid instructions are 
being executed. Now we write the refinement relation for operands. We specify 
that if a non-shadowed RS holds an operand value, it must be the correct value. 
Here is the specification for the a operand: 
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forall(fc in TAG) layer lemmal : 

valid A st[K\.opra.valid A ^aux[K\. shadow) 
st[K\.opra.val := aux[K\.opra] 

This specifies the a operand value for RS k, when it is valid (holding and instruc- 
tion), and when the a operand is valid, and when it is not shadowed. Otherwise 
the value is unspecified. We can write a similar specification for the result value, 
and for other data values in the machine as necessary. 

4.2 Verifying Operand Correctness 

Now we must verify the above lemma. To verify data, we split cases on the 
possible sources of the data. Here, an operand value we read is generated by 
the most recent instruction to write the source register. We can identify this 
instruction’s RS by recording the tag of the most recent RS to write each register. 
We then assume, by induction, that results computed at earlier times are correct. 
We need one additional fact, however: that the most recent writer in execution 
order is in fact the most recent writer in program order. If this is the case, then 
we must read the same value read by the reference model. 

One way to establish this is to split cases on both the most recent writer in 
the implementation and the most recent writer in program order. Since the im- 
plementation retires instructions in program order, these two must be the same, 
hence correct values are always read. However, there is a complexity problem: 
the abstraction in this case will involve three distinct tag values, and hence the 
states of three distinct RS’s. In practice, we found the time and space required 
to verify this model prohibitive. Instead, we used an intermediate lemma to sim- 
plify the problem. We observed that a register value is only read when no writes 
to the register pending, in which case its value is up-to-date with respect to the 
reference model. Thus, we specified the register contents as follows: 

forall (z in REG) layer uptodateReg : 
if {^ir[i\.resvd) ir[i\.val := r[i]; 

That is, if no write is pending to register ir[i], its value matches reference model 
register r[i]. This is verified using the case split described above, which is given 
to SMV as follows: 

subcase uptodateReg[i][k][c] of ir[j].val/ /uptodateReg 

for auxLastIssuedRS[j]=i A auxLastWriterRS[j]=k A r[j]=c; 

That is, we let i be the last writer to register j in program order, k the last 
writer in the implementation, and c the correct data value. In this case there are 
only two distinguished tag values, i and k, so the abstraction contains only two 
RS’s. 

In fact, the first attempt to check this property produced a counterexample 
in which some abstracted instruction causes a flush, cancelling the instruction 
that should write register j. The abstract model allows this because the states 
of RS’s other than i and k are unknown. To deal with this, we introduce a 
non-interference lemma, stating that no unshadowed instruction is flushed: 
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forall(z in TAG) lemma5[i] : assert G 

{flush shadow A {completestf^i -^{st[i\.valid A ^aux[i\. shadow)))] 

Here, completest is the tag of the RS causing the flush. We prove this by split- 
ting cases on the flushing instruction. This eliminates the above counterexample 
to the up-to-date register property, leaving another counterexample in which 
a shadowed instruction writes register j and corrupts its value. This calls for 
another lemma stating that no shadowed instruction retires: 

lemmaS : assert G ( retiring ^aux[completest\. shadow)] 

This can be proved by splitting cases on the currently retiring instruction and 
the instruction that set the shadow bit {e.g. a mispredicted branch). That is, the 
latter must retire and cause a flush before the shadowed instruction can retire. 
With this additional lemma, the up-to-date register property is verified. Now 
operand correctness is easily proved by splitting cases on the source register and 
the operand’s tag, which indicates the data source when forwarding from the 
result bus: 

subcase lemma l[i][j][c] of st[k].opra.val/ /lemma 1 

for st[k].opra.tag = i A aux[lf\.srca = j A aux[k].opra = c; 

The specification for results returning from execution units can be verified us- 
ing operand correctness. This requires a non-interference lemma stating that 
unexpected results are never returned. 

4.3 Verifying Memory Data Gorrectness 

We also specify the the results returning from the data memory, as follows: 

lemma/ : assert G ( ^mqaux[mqJiead\. shadow A memJd A mem-enable 
A load-from-mem ^ mem-rd-data = mqaux[mq-head\.data)] 

Here, mq-head points to the currently executing operation in the load-store 
queue. That is, if the current operation is an unshadowed load, then the data 
from memory are the correct data stored in the auxiliary array mqaux. We break 
this into two cases - when data are read from memory and when data are for- 
warded from the load-store queue. Here we consider only the former case, al- 
though the latter is similar. 

This property is similar to the one specifying values read from the register 
file. Here, we must prove that, for any load, the most recently executed store to 
the same address (call it Se) is also the most recent in program order (call it 
Sp). As before, we use auxiliary variables to identify Se and Sp in the queue. 
Splitting cases on these two stores and the current load, we should be able to 
prove that Se and Sp are the same, hence read data are correct. 

Unfortunately, the abstract model with two stores and one load is too large to 
model check. We cannot solve this problem as before by writing an “up-to-date” 
lemma for the memory, since we may read the memory when it is not up-to-date. 
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Instead, we split cases only on the current load L and on Se- This produces a 
counterexample in which Se < Sp < L in program order. That is, at the time L 
occurs, Se has executed but Sp has not. This cannot really happen, because the 
unexecuted store Sp would block load L. However, since Sp is abstracted, this 
information is lost. To avoid splitting cases on Sp, we simply state as a lemma 
that Sp < Se- In SMV, we say: 

lemmata : assert G ( ^mqaux[mqJiead\. shadow A memJd A mem-enable 
A load_from_mem ^ {imtag[mem-addr] > mqaux[mqJiead\.lastWrite )); 

Here imtag[mem-addr] is Se, while mqaux[mqJiead].lastWrite is Sp. This can 
be proved using another lemma, stating that stores always occur in program or- 
der. All three properties can be proved using just two memory queue elements. 
We reduce the problem further by writing a refinement relation for the data in 
the load-store queue. This allows us to abstract out the RS’s when proving mem- 
ory properties. This required a lemma stating that unshadowed queue elements 
are never flushed, which follows directly from the fact that unshadowed RS’s 
are never flushed. The resulting abstract models can be handled easily by the 
model checker. At the cost of additional lemmas, we have reduced an intractable 
problem to a tractable one. 



4.4 Remaining Steps 

For the program counter (PC), we write a refinement relation stating that, when 
the shadow bit is not set, the implementation PC equals the reference model PC: 

layer opok : {{(^shadow) ipc := pc. 

Since the PC can be loaded from an RS (in case of a flush) or from a register 
(for a JMP), we split cases on the most recent reservation station to and on the 
source register of the previous instruction. We also use the two lemmas about 
speculation. Further refinement relations specify the decoded instruction and 
branch target. This isolates the uninterpreted functions computing these values. 

Finally, we must prove our overall correctness criterion, correctness of out- 
puts. The OUT instruction reads a register and sends its value to the output 
port. Thus, the up-to-date register property suffices to prove output correctness. 
Overall, the proofl consists of the following elements: (1) refinement maps for the 
program counter, instruction decoder, register file, RS’s and load-store queue, 
(2) two non-interference lemmas for speculative execution, two for the result bus, 
and four for the load-store queue (3) case splitting instructions for the above and 
hints for adjusting the abstractions, and (4) auxiliary variable declarations. All 
told, this information comprises less than 18K bytes, somewhat less than the 
size of the microarchitecture model and its specification. 

^ By “proof”, we mean ail the input used to guide a mechanical prover, and not a 
proof in the mathematical sense. 
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Table 1. Proof Size vs. Feature Set. 



Model 


Proof size 


A (baseline) 

B = A -I- out-of-order 
C = B -I- speculation 
D = C + load-store buffer 


5700 bytes 
7000 bytes 
13K bytes 
18K bytes 



To summarize, our strategy is to reduce the verification problem “units of 
work”, in this case instructions. Since each instruction uses only finite resources, 
we can verify its correctness using a finite abstraction of the system. We identify 
the resources used by the instruction {e.g. RS’s, registers, etc.), by introducing 
auxiliary variables. Once we “split cases” on these resources, the pointer types 
and arrays are automatically reduced, yielding a finite abstract model. 

The novel aspects of this proof are in the treatment of speculation, and of 
read/write hazards. We handled speculation by introducing an auxiliary shadow 
bit for each instruction in the machine. We then show two key facts about the 
system: that unshadowed instructions are never canceled, and that shadowed 
instructions never retire. To handle read/write hazards, we use an abstraction 
strong enough to prove that the most recent writes to an address in execution 
and program order are the same. 

Finally, to address the question of scalability, we consider four designs of 
increasing complexity: design A is a simple in-order processor, design B adds 
Tomasulo’s algorithm for out-of-order execution, design C adds speculative exe- 
cution and design D adds a load-store buffer. TableQshows the textual size of the 
proofs we obtained for these four designs. Adding Tomasulo’s algorithm is the 
simplest step, involving only a few additional case splits and two non-interference 
lemmas. Adding speculation and the load-store buffer is more complex, because 
of the register and memory ordering properties we must prove. Nonetheless, we 
find that the complexity of the interactions between these features does not 
make the proof intractable. Rather, the proof increment associated with adding 
a feature remains moderate, at least for this example. 

5 Comparison with Other Approaches 

We now compare our proof with proofs of similar microarchitecture models us- 
ing other methods. We consider proofs by Sawada and Hunt [SH^iSj . Velev and 
Bryant and Hosabettu et al. !HGS00| . All of these proofs are variations 

in some form on the method of Burch and Dill |BD94j , in which an abstraction 
function is constructed by “flushing” the implementation, i.e., inserting null op- 
erations until all pending instructions are completed. This yields a “clean” state 
which can be compared to the reference model state. One then proves a commu- 
tative diagram, that is, that taking one implementation step and then applying 
the abstraction function yields the same state as applying the abstraction func- 
tion followed by zero or more reference model steps. This can be done in an 
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almost fully automated way for simple pipelines, and has the advantage that the 
abstraction function is mechanically constructed. 

However, the method has two distinct disadvantages. First, for complex archi- 
tectures, the abstraction function is generally not strong enough to be inductively 
invariant. It must be manually strengthened with information about reachabil- 
ity of control states. In our method, no such information is required. Second, 
the the abstraction function depends on the entire machine state, including all 
the instructions that are currently in the machine. For complex architectures, 
it becomes intractable to deal with it automatically. In our method, we reason 
about only one or two instructions. Thus, the proof obligations are local, and can 
be handled by model checking. By contrast, most recent work using abstraction 
functions manually decomposes the flushing function into smaller, more tractable 
parts. Thus the Burch and Dill method’s advantage of full automation is lost. To 
see this, we consider the extant proofs in more detail. A comparison of textual 
sizes of models and proofs is given in table |3 



Sawada and Hunt. The work of Sawada and Hunt is perhaps the first 

formal proof a “modern” microprocessor architecture. Their processor model 
uses Tomasulo’s algorithm, branch prediction, precise exceptions and a load 
store buffer with forwarding. The model is qualitatively similar to ours, with 
a few differences. They model asynchronous interrupts, while we do not. They 
use a fixed set of execution units (one per instruction type) while we do not. 
Thus, they associate RS’s statically with execution units, while we choose the 
execution unit at issue time, to maximize use of the execution units. Also, their 
load-store buffer holds two loads and one store, while we model an arbitrary 
number of entries. 

The model is defined by a collection of Common LISP functions in the the- 
orem prover ACL2 |KM9fi) . We report in table El the approximate textual size 
of the functions describing the processor architecture, excluding theorems and 
generic functions not related to processor modeling. This is roughly three times 
the textual size of our model in the SMV language. In our estimation, this differ- 
ence is largely accounted for by the greater conciseness of the SMV language as a 
hardware description language. However, some details present in the Sawada and 
Hunt model, such as an explicit instruction decoding function, are not present in 
our model, since we model them generically using uninterpreted functions. Defin- 
ing these functions explicitly would increase the description size, but would not 
affect the proof. 

Sawada and Hunt use an intermediate abstraction called a MAETT, a table 
tracking of the status of all instructions being executed in the machine. They 
then relate the MAETT to the implementation and the reference model using 
invariants, which are proved by induction. We do not use an intermediate ab- 
straction, although our auxiliary variables do contain information similar to that 
in the MAETT. The chief difficulty reported by Sawada and Hunt is that the 
invariant must be strengthened by auxiliary invariants of the implementation 
state. No such invariants occur in our proof (although we do need a few lemmas 
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concerning which events may occur in certain states). This leads to a stark dif- 
ference in the textual size of the proofs: their proof (for the FM9801 processor) 
is roughly 1909K bytes, of which nearly a megabyte is the inductive invariant. 
Our proof is less that 20K bytes, smaller than the model description itself. This 
difference of two orders of magnitude is more than enough to account for differ- 
ences in models, the succinctness of representation, whitespace, etc. By another 
measure, the Sawada and Hunt proof has roughly 4000 lemmas, whereas ours 
has approximately 18 (depending on how one counts). 



Velev and Bryant. The approach of Velev and Bryant [VBOOj is closely based 
on the Burch-Dill technique. They focus on efficiently checking the commutativ- 
ity condition for complex microarchitectures by reducing the problem to checking 
equivalence of two terms in a logic with equality, and uninterpreted function sym- 
bols. Under certain conditions, their decision algorithm is able to check equiv- 
alence of the massive formulas obtained from flushing complex models. Some 
manual work is required, however, to put the problem in a form suitable for the 
tool. They handle architectures with deep and multiple pipelines, multiple-issue, 
multi-cycle execution units, exceptions and branch prediction, for fixed finite 
models (note, we treat models with unbounded resources). Notably, they do not 
treat out-of-order execution, or load-store buffers. We conjecture that this is due 
to the complexity of the flushing functions, and the need for complex auxiliary 
invariants in these cases. 



Hosabettu et al. Hosabettu et al. have published a series of papers on mi- 
croprocessor verification, based on the “completion functions” approach. The 
microarchitecture they model in [HGS0I| is similar to ours in that it has out-of- 
order execution, branch prediction, precise exceptions and it buffers stores (but 
not loads, which are atomic). Stores are executed in program order, while in our 
model they can be out-of-order. Also, they model a processor status word, while 
we do not. 

Hosabettu et al. prove a commutative diagram, but decompose the abstrac- 
tion function into completion functions for each instruction in the machine. A 
completion function specifies the future effect of an unfinished instruction on the 
observable state. They define completion functions for each instruction type, in 
terms of the present status of the instruction in the machine, and also whether 
that instruction will squash subsequent instructions, ensuring they do not affect 
the program state. The abstraction function is the composition of the comple- 
tion functions. A commutative diagram is proved using PVS [OHSvH 9?^ for the 
decomposed abstraction function. 

This approach has the advantage of avoiding applying a decision procedure 
to the entire flushing function. However, proofs of the commutativity obligations 
require auxiliary invariants that characterize the reachable states of the model. 
To reason about the composite abstraction function, one must enumerate man- 
ually the various instructions in a particular state, the exact transitions they 
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Table 2. Textual sizes of the Models and Proofs. 



Technique Used 


Proof Assistant 


Size of Machine Spec 


Size of Proof 


Sawada & Hunt |STf9Sj 


ACL2 


'60K bytes 


1909K bytes 


Hosabettu et al. IHOSOdl 


PVS 


'70K bytes 


~2300K bytes 


Compositional Model Checking 


SMV 


20K bytes 


18K bytes 



might make, the position of the “squashing” instruction, and so on. While de- 
composing the abstraction function makes reasoning about each case simpler, 
considerable manual effort is still required in stating invariants and guiding the 
prover. 

The authors report that the proof took much less time than that of Sawada 
and Hunt. However, the textual size is comparable. The proof uses approxi- 
mately 300K bytes of PVS specifications, and 2000K bytes of proof script (man- 
ual prover guidance) . The latter, while generated manually, contains considerable 
redundancy. Thus its large size may not accurately reflect the effort needed to 
create it. We conjecture the large proof size results from the need for auxiliary 
invariants, and the theorem prover’s greater need for manual guidance vis-a-vis 
model checkers. 

6 Conclusion 

We have shown that compositional model checking methods can verify a pro- 
cessor microarchitecture with most of the architectural features of a modern 
microprocessor. We introduced proof strategies to handle speculative execution 
(using shadow bits) and to handle read/ write hazards (case splitting on the 
most recent writes in program and execution order). The proof methodology 
scales well in that the incremental proof cost associated with each processor fea- 
ture is low. Moreover, the proof is concise relative to proofs using other methods 
(and is smaller than the model description itself). Although proof size is not 
necessarily an indication of the human effort required, we consider the difference 
of two orders of magnitude to reflect a qualitative difference in proof complexity. 
We ascribe this difference to several factors. 

First, as reported both by Sawada and Hunt and by Hosabettu et al, one 
of the most time consuming aspects of their methods is specifying auxiliary 
invariants. We exploit the model checker’s ability to compute reachable states to 
avoid writing such invariants. Second, by stating refinement relations as temporal 
properties we can decompose the proof into “units of work” , such as instructions, 
that are temporally and spatially distributed but use finite resources. This avoids 
reasoning about the entire state of the machine, and allows us to use small, finite- 
state abstractions. Finally, we exploit the fact that model checkers require less 
manual guidance than theorem provers do. 

Nonetheless, there remains much room for improvement. For example, some 
lemmas in our proof could be eliminated if the model checker were able to handle 
three instructions in the abstraction instead of two. We have found that the 



410 Ranjit Jhala and Kenneth L. McMillan 



symbolic model checker can handle abstract models with only about half the 
number of state bits that can be handled with concrete models. The reason for 
this is unclear, though it may be that the abstract state spaces are less sparse, 
or that there is greater nondeterminism in the transition relation. This does not 
affect the scalability of the proof methodology, but the “constant factor” would 
be improved if the model checker could handle larger abstract models. 

To handle asynchronous interrupts, it would be useful to implement 
“prophecy variables” , so that the witness function that stalls the reference model 
could depend on the future of the implementation. Also, to implement a specific 
instruction set architecture, we must substitute concrete functions for the un- 
interpreted functions in our model. Support for this is currently lacking in the 
prover, though it would be straightforward to implement. 

On the whole, although proofs of this sort are considerably more laborious 
than model checking finite state machines, we feel that the methodology scales 
well, and that additional processor features, such as a first-level cache, an address 
translation unit, or multiple-issue could be handled in a straightforward manner, 
with the addition of a few lemmas for each feature. 
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Abstract. We describe an algorithm for simplifying a class of symbolic 
expressions that arises in the symbolic execution of formal state machine 
models. These expressions are compositions of state access and change 
functions and if-then-else expressions, laced together with local variable 
bindings (e.g., lambda applications). The algorithm may be used in a 
stand-alone way, but is designed to be part of a larger system employ- 
ing a mix of other strategies. The algorithm generalizes to a rewriting 
algorithm that can be characterized as outside-in or lazy, with respect 
both to variable instantiation and equality replacement. The algorithm 
exploits memoization or caching. 

Keywords: Hardware modeling, verification, microprocessor simulation, 
theorem proving, pipelined machine. 



1 Relevance to Processor Modeling 



A common application of such mechanized theorem provers as ACL2 HOL 
y and PVS is the modeling and analysis of microprocessors and other state 
machines . 

The ACL2 theorem prover is particularly suited to processor modeling 

because it supports an efficient functional programming language based on Com- 
mon Lisp ^3’ Hence, operational models formalized in ACL2 can be executed 
as processor simulators. This is not a speculative assertion. Rockwell Collins has 
constructed microarchitectural executable formal models of some of its custom 
microprocessors in ACL2 The models have been integrated into a standard 
execution environment, replacing preexisting simulators written in more com- 
mon programming languages such as C. The ACL2 models run at roughly the 
same speed as the original models. (How this is possible will become clear be- 
low). Reasoning about state machines requires symbolic simplification of terms 
representing states. Straightforward simplification algorithms can cause unnec- 
essary exponential blowups in the size of the expression. This paper presents an 
algorithm for avoiding many of those explosions. 
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2 The Problem 

We present an algorithm for simplifying expressions that arise from the symbolic 
manipulation of formally described state machines. We use ACL2 term notation 
(i.e., Lisp notation). But the algorithm is of general interest in any formal setting 
where (a) terms are used to represent states, (b) “access” and “change” functions 
are provided, and (c) variable binding is present (e.g.. Lisp let expressions, 
lambda applications, or, more generally, the application of defined functions). 
Our algorithm also deals with if-then-else constructs. 

For example, a state, s, might have three components, named a, b, and c. 
We write (a s) to access the a component of s and (update-a x s) to create 
a new state like s but with x as its a component. 

Of special interest are nests of updates. A simple example is shown below. 

(let ((s (update-a (new-a x s) s))) ; [*1] 

(let ((s (update-b (new-b x s) s))) 

(let ((s (update-c (new-c x s) s))) 
s))) 

Each successive let changes the assignment of the variable s. So the s in the 
new-b expression refers to the state obtained by updating the a slot of the 
“original” (free) s. 

Logically speaking, (let ((rii oi) ... (t„ a„)) 6) is equal to the instance 
of b obtained by simultaneously replacing all free occurrences of each Vi by the 
corresponding a^. It is often read “let be oi, . . ., and be a„ in 6,” or perhaps 
more suggestively as “6, where vi is oi, . . ., and Vn is a„.” 

In ACL2, let expressions are syntactic sugar for certain lambda applica- 
tions. Roughly speaking, (let ((ui oi) ... ivn cin)) 6) is just ((lambda (ui 
. . . Vn) b) oi ... On)- We say “roughly speaking” because in ACL2 when we 
translate lets into lambda applications we make sure that every free variable 
of b is captured by the formal variables of the lambda (by adding extra formats 
and the corresponding actuals, as needed). 

Replacing the lets in an expression by the corresponding lambda applications 
and performing beta reduction (i.e., expanding the lambdas away) may yield an 
exponentially larger term, because of variable duplication. This happens in [*1] . 

We use let nests to describe state transformations as sequences of assign- 
ments to the components of the state. Formal models so expressed can be exe- 
cuted efficiently. The variable symbol s in [*1] is used in a “single-threaded” 
Q way so that during execution on concrete data the original state may be de- 
structively modified to create the new one. This efficiency is crucial to the use 
of the model as a simulator. 

Now imagine defining a series of functions, e.g., phasel, phase2, . . ., in terms 
of expressions like [*1] and using them as the “updaters” in some let expres- 
sion that produces a state. Realistic models involve many layers of definitions, 
culminating in some top-level state transition expression, e.g., (machine x s). 

We will present an algorithm for simplifying such expressions as (b (machine 
X s)) with less computation than may at first appear necessary. One could do 
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this by expanding away the lets, beta reducing all the lambdas and expanding 
all the (non-recursively) defined function applications, and then applying the 
obvious accessor/update rewrite rules, possibly in a “lazy” or outside-in way. 
However, the reader is urged to dismiss the thought that complete beta reduc- 
tion (or the equivalent expansion of all non-recursively defined function defini- 
tions) is practical. Consider a C simulator for a system of interest and count 
the number of assignment statements: that is about the number of let bindings 
in the executable formal version of that model. Researchers at Rockwell Collins 
report [private communication] 

The typical complexity of high-level language models of these machine 
architectures has a depth around 300 assignment statements. That is, the 
execution of the simulator for one microcycle can involve the execution 
of about 300 state updates, which means that the translated-into-ACL2 
model is a nest of state updates about 300 levels deep. Each “level” of 
the update nest typically contains at least two instances of state: the 
state being updated and a value being inserted typically expressed as a 
function of the state being updated. 

If state is used twice at every level, the full beta reduction of such a term would 
contain on the order of occurrences of the updaters. From such consider- 
ations we conclude that it is impractical to contemplate full beta reduction of 
such models. We thus focus on simplification in the presence of such bindings. 

3 Some Tests 

Before presenting our algorithm we will present a simple test suite for it and 
show some performance data to motivate the rest of the paper. The simple test 
here is available at 

ittp : //WWW. cs .utexas . eau/users/moore/puDiications/nu- rewrite: 

In our simple test suite, we first declare a state object s, with two fields, 
a and b, accessed by functions of those names and updated by update-a and 
update-b. We next declare three uninterpreted function symbols, vO, vl and v2. 
Then we define phase 1 to do six successive updates on s, changing the a field to 
contain a new value computed conditionally as a function of the current a field 
using the three uninterpreted functions. 

(defun phasel (s) 

(let ((s (update-a 

(if (vO 1 (a s)) (vl 1 (a s)) (v2 1 (a s))) 
s))) 

(let ((s (update-a 

(if (vO 2 (a s)) (vl 2 (a s)) (v2 2 (a s))) 
s))) 

s...))) 
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Our first example, named b-phasel, is the theorem that phasel does not change 
the contents of the b field: (equal (b (phasel s)) (b s)). 

The second theorem, b-phasel-phasel, just composes phasel with itself, 
(equal (b (phasel (phasel s))) (b s) ), and could be proved trivially from 
b-phasel except that we prevent such a proof by disabling b-phasel. 

The third theorem, a-phasel, describes the value of the a field after phasel. 

We then complicate the test by defining two more phases. Phase 0 copies 
the a field into the b field. Phase 2 copies the b field into the a field. We define 
machine to do phaseO, then two phasel steps, and then phase2. 

The fourth theorem, a-machine, shows that machine does not change the a 
field, (equal (a (machine s)) (as)). The fifth, b-machine, shows that the 
final b field is the initial a field, (equal (b (machine s)) (as)). 

Each theorem can be proved by rewriting alone. We prove each with ACL2 
Version 2.6 (the first to include our algorithm) in each of two configurations. 
In “standard ACL2,” the algorithm is disabled; in ‘V-ACL2,” the algorithm is 
enabled. All of the tests were conducted running under Allegro Common Lisp 
on a 731 MHz dual-processor Pentium III. Time is measured in seconds. The 
results are shown in Figure^ 



Theorem 


standard ACL2 


J/-ACL2 


b-phasel 


0.48 


0.01 


b-phasel-phasel 


128.76 


0.01 


a-phasel 


0.41 


0.04 


a-machine 


139.39 


0.02 


b-machine 


143.91 


0.02 



Fig. 1. Seconds to Prove Theorems on 731 MHz Pentium HI 



Note the growth in standard ACL2’s times from b-phasel to b-phasel — 
phasel. Comparing the old rewriter’s performance with that of the new one 
on industrial data is essentially impossible because the old rewriter exhausts 
resources before completing interesting problems of the kind handled routinely by 
the improved system. (Adding one more phasel step to b-phasel-phasel causes 
standard ACL2 to exhaust memory after six hours of computation; V-AGL2 does 
“b-phasel^” in 0.11, b-phasel^ in 6.46, and b-phasel® in 412 seconds.) 

The terms arising in typical machine models are not as regular as those 
in this test suite. Our algorithm does not distinguish “control” from “data,” 
require the identification of “phases,” or limit itself to single-threaded states. In 
addition, typical industrial machine states have hundreds of components. Some 
of those components are atomic (e.g., contain booleans, integers, etc.) others may 
themselves be structured as records or arrays. ACL2 supports states containing 
arrays and the simplification algorithm we have implemented does also. But in 
this paper we confine our attention to “flat” states. 
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4 Terminology 

We now prepare to describe our algorithm precisely, starting with the terminol- 
ogy and conventions we use. In ACL2, let expressions are just syntactic sugar 
for lambda applications. Lambda expressions are handled just like other function 
applications. Each lambda expression has a list of formal variables and a term for 
a body. All free variables in the body are among the formats. Functions may only 
be applied to the correct number of actuals. The function application (/ oi . . . 
a„) is equal to its beta reduction, the result of instantiating the body of / with 
the substitution replacing Vi by o^. We use the verbs “to open” or “to expand” 
to describe the replacement of a function application by its beta reduction. If / 
is a lambda expression or / is a function symbol and that symbol is not used as 
a function symbol in the body of /, we say / is non-recursive. Henceforth, we 
do not talk formally about lets but about non-recursive function applications. 

In ACL2 the state accessor and updater functions are logically defined in 
terms of a “universal” accessor nth and a “universal” updater, update-nth, 
where (nth i s) extracts the element of the list s and (update-nth i v 
s) constructs a list like s but whose element is v. Thus, a term like (b 
(update-c x s)) expands to (nth 1 (update-nth 2 x s)). Our algorithm is 
fundamentally concerned with applying the theorem 

Theorem, nth-update-nth; 

(equal (nth i (update-nth j v s)) 

(if (equal (nfix i) (nfix j)) v (nth i s))) 

as a rewrite rule (left-to-right). The function nfix is the identity on natural 
numbers and otherwise is 0. Its use in the theorem above is a reflection of the 
absence of syntactic typing in the language. The theorem says that the i*^ 
component of the state produced by updating the component of s with v is 
either v or the i*^ component of s, depending on whether i and j are equal. The 
definitions of user-level state access/update functions (e.g., b and update-c) are 
treated as ordinary function definitions like phase 1 above. 

We call expressions like [*1] “nth/update expressions” or v- expressions (for 
“nu” or “nth/update”). This loosely defined class of expressions includes state 
accessor/updater functions defined in terms of nth and update-nth, their array 
counterparts, if-then-else expressions, and variable binding constructs such as 
let or function or lambda application. 



5 Binding Stacks, Facets, and Reconciliation 

ACL2’s standard rewriter is inside-out. To rewrite (/ a\ ... a„) it first rewrites 
the Oi to standardize them. Thus, the opportunity to apply nth-update-nth to 
(b (phasel x s) ) occurs only after (phasel x s) is expanded to an update — 
nth expression. This may exponentially increase the size of the term. 

Instead of rewriting 02 in (nth oi 02 ) we wish to “look ahead” to see 
whether we can “see” 02 as an update-nth expression, expanding non-recursive 
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functions as necessary. For example [*1] can be seen as an update-c expression, 
which can, in turn, be seen as an update-nth expression. These expressions must 
be understood in an appropriate variable binding environment. Note that the 
update-c expression in [*1] buried in the expression and would be the late in 
the process of ordinary rewriting. By nth-update-nth, if the indices in the nth 
and update-nth expressions are the same, the answer is (new-c x s), under 
appropriate bindings for x and s; if the indices are unequal, the answer is (nth 
oi s), under appropriate bindings. Clearly, if we can decide the equality of the 
indices then work can be saved. (Often, in this setting, the indices are constants.) 
The challenge is to keep the bindings straight. 

Many applications require descending through hundreds of lambda expres- 
sions. We want to “be” inside the deepest lambda without creating the instance. 
We therefore introduce the idea of seeing a term in the context of a substitution 
and we represent the substitution as a stack of function call frames. This is just a 
generalized version of a nest of lambda applications. We call this object a “facet” 
and define it below. 

A binding stack is a stack of frames. Each frame contains a list of n variables 
and a list of n terms. The free variables occurring in the terms of a frame (other 
than the deepest frame) are among the variables of the frame immediately below. 

We represent stacks as lists, where the first element of the list is the top 
frame. Here is a stack containing two frames, 

(((a b) . ((afn u w) (bfn u v))) ; frame 1 

( (u w v) . ( (uf n s) (wfn s) (vfn s)))). ; frame 2 

Call this stack a. In the top frame of a, frame 1, a is associated with (afn u w) 
and b with (bfn u v) . We say (afn u w) is the term corresponding to a in that 
frame. The representation of frames this way, rather than as association lists, 
makes them faster and cheaper to create. 

A stack represents the substitution created by pairing each variable in the 
top frame with the result of instantiating its corresponding term with the sub- 
stitution represented by the rest of the stack. Thus, the stack a represents the 
substitution that replaces a by (afn (ufn s) (wfn s)) and b by (bfn (ufn 
s) (vfn s)). 

A facet is a pair consisting of a term t and stack a, written < t,a >, and 
represents the instance of t under the substitution represented by a. Hence, if a 
is the example stack above, the facet <(h a h) ,a > represents (h (afn (ufn 
s) (wfn s)) (bfn (ufn s) (vfn s))). 

When we refer to a facet as though it were a term, we mean to refer to its 
term component. An empty facet is one whose stack is the empty list, () . 

The function symbol of a non-variable, non-constant facet is the same as 
the function symbol of the term it represents. This allows us seldom to create 
the substitutions represented by stacks or the terms represented by facets. In- 
stead, we “chase” the variable bindings when we need them. Facets are similar 
to the records and binding environments of the structure sharing representation 
of clauses Q. Another way to think of a facet is that it is a nest of lambda ap- 
plications turned inside out and flattened. Given a nest of lambda applications. 
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the term of the corresponding facet is the body of the innermost lambda expres- 
sion and the stack of the facet is the list of paired formats and actuals, starting 
with that for the innermost lambda application and proceeding outwards. Facets 
have two computationally convenient properties. First, if the term of a facet is 
an application of a defined non-recursive function, then we can represent the 
expansion of that function by a facet easily derived from the first. Second, if the 
term of a facet mentions a variable symbol then we can easily find out how that 
variable symbol is replaced by the substitution and we can represent the actual 
expression by another facet easily derived from the first. Lambda expressions are 
nested the “wrong way” to make these operations efficient. 

We define finite chains of facets related by a generalized notion of expansion. 
Let 4> be the non-empty facet < t,a >. Then its expansion, (j)' , is defined as 
follows. If t is a variable symbol that is not a member of the variables in the top 
frame of ct or t is a constant, (/)' is < t, 0 >. If t is a variable symbol that is a 
member of the variables in the top frame of a, (j)' is < t' , a' >, where t' is the 
term corresponding to t in the top frame and o' is the result of removing the top 
frame from a. If t is the application of a defined non-recursive function, /, with 
formats v and body b, to actual expressions a, (j)' is <b, ((v . a) . a) >, i.e., 
the facet whose term is the body of / and whose stack is obtained from ct by 
pushing a new frame containing the formats and actuals. In all other cases no 
expansion is possible. 

All the facets in an expansion chain represent equal terms. We call them 
“facets” because they are different ways of looking at a term. 

The preferred facet of a facet 4> is the last facet in its expansion chain. Note 
that since update-nth is a recursive function, if (j) can be seen as an instance 
of an update-nth term by sufficient expansions of non-recursive functions, then 
the preferred facet of (j) will have an update-nth term as its term component. 

Given a facet we can economically create a term equal to the one it represents, 
using lambda abstraction. The lambda abstraction of the facet < 6, ()> is the 
term b. The lambda abstraction of < b,((v.a) . <t > is the lambda abstraction 
of < ( (lambda v b) a) >. Note the bindings of the abstraction occur in the 
opposite order. The size of the lambda abstraction of a facet is linear in the size 
of the facet. 

An important optimization of lambda abstraction is to eliminate unnecessary 
bound variables. If the body of a lambda does not use a variable symbol that is 
listed in the formats, it and the corresponding actual can be eliminated. Another 
optimization is that variables bound to constants can be eliminated. 

Because we will manipulate facets in lieu of the terms they represent, we will 
also have occasion to form new facets by putting together several others. 

For example, let (pi, 1 < i < n,he n facets, each of the form < tj, at >. Each 
4>i represents a term r^. Think of the 4>i as having been generated by applying 
our algorithm to the arguments of a call of some function /. We wish to represent 
the term (/ ri ... r„) as a facet. We call this the reconciliation oi if 4>i . . . 
(/)„). Note that (f 4>i ... 4>n') is neither a term nor a facet. It fails to be a 
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term because it contains facets. It fails to be a facet because there is no single, 
outermost stack. 

The reconciliation of (f 4>i ... (/)„) is computed as follows. We first find the 
greatest common ancestor stack, a, of the ai. Let pi be the top part of <Ji, down 
to the common ancestor a. Thus, at is the concatenation of pi and a. Let be 
the lambda abstraction of the facet < U, pi >. Then < (/ ... a > is the 

reconciliation of (/ 4>i ... (/)„) and is a facet that represents a term equal to 
(/ ri ... r„). 

Reconciliation has two important optimizations. The first is that preferred 
constant facets, i.e., facets whose terms are constant expressions, have empty 
stacks. If these empty stacks participate in the greatest common ancestor com- 
putation, the ancestor stack is always (), meaning the reconciled subexpressions 
share no subterms. But constants denote themselves in any stack. So we ignore 
constant facets when determining the ancestor. The second optimization of rec- 
onciliation exploits an empirical observation. Frequently all the non-constant 
facets in a reconciliation have the same stack. In that case, that stack is the 
ancestor. This case arises so frequently (in 98% of the cases over a test involving 
roughly 100,000 reconciliations) that it is worthwhile to code for it. 

6 Our Algorithm 

We now describe an algorithm for simplifying a term by applying nth-update — 
nth and expanding functions. We call the rewriter the “i/-rewriter.” The algo- 
rithm operates on facets. To use it on terms we apply it to the empty facet 
containing the term and then we lambda abstract the resulting facet. 

The i^-Rewrite Algorithm 

1. We wish to i/-rewrite the facet (j). Let (j)' be the preferred facet of (p. If 
(j)' is a variable or constant facet or the term of (j)' does not begin with 
nth, we return (j)' . 

2. Otherwise, (j)' is <(nth i t),a >. Let i be the facet obtained by v- 
rewriting < i,a >. Let t be the preferred facet of < t, <t >. If t is a 
variable or constant, we reconcile and return (nth i t) . 

3. At this point, we know i is a function application. Since i is a pre- 
ferred facet, its term is not a lambda application. Let / be the function 
symbol of t. Our code considers five cases on /: it is if, update-nth, 
update-nth-array, nth, or some other symbol. 

3.1 If / is if, then t is of the form <(if a b c),p >. Let <j)i 
be the result of reconciling and i/-rewriting (nth i < b, p >) 
and let 4>2 be the result of reconciling and i/-rewriting (nth i 
< c,p>). 

3.1.1. If (j)i and (f )2 are the same facet, return (f>i. 

3.1.2. If no applications of nth-update-nth were made 
in producing (pi or <p 2 , then return the reconciliation of 
(nth i i) . 
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3.1.3. Otherwise, let 4>o be the result of i^-rewriting 
< a,p>. 

3. 1.3.1. If 4>o is a constant facet, return 4>2 or 4>i 
according to whether the constant is nil (i.e., 
the test of the if can be decided). 

3. 1.3. 2. Otherwise, return the reconciliation of 
(if (j)o (fll 4>2')- 

3.2. If / is update-nth, t is of the form < (update-nth j v 
s) , p >. Let j be the result of i/-rewriting the facet < j, p >. 

3.2.1. If i and j represent equal naturals, we return the 
result of i/-rewriting the facet < v, p >. 

3.2.2. If i and j represent unequal naturals, we return 
the result of i/-rewriting the reconciliation of 

(nth i < s,p >). 

3.2.3. Otherwise, we return the reconciliation of (if (equal 
(nf ix i) (nf ix j) ) < v, p > (nth i < s, p >)) . 

3.3 and 3.4. If / is either update-nth-array or another nth, 
then (assuming the original term was derived from a state ac- 
cess/update nest) we are dealing with an array or some other 
structured component. To keep this paper brief, we do not dis- 
cuss that case here, but it is analogous to what we have described. 

3.5. If / is some other symbol, then we return the reconciliation 
of (nth i t) . 

7 Discussion 

The algorithm focuses entirely on terms of the form (nth i t) . The main case 
split is on the form of t. 

In paragraph 3.1 we consider the case that t can be seen as an if-then- 
else expression. We might be i^-rewriting a term like (nth i (if a b c)), but 
more often we are i/-rewriting a term like (nth i (phase a s) ) , where phase 
is defined to be a nest of lets with an if expression as the body. 

Observe that in attacking (nth i (if a b c)) we first “distribute” the if, 
moving the nth onto b and c. After rewriting these two subgoals we ask whether 
the resulting facets are equal. If so, we can avoid rewriting a by virtue of (if x 
y = y- Of course, we might have chosen to rewrite a first and determined that 
it is equal to nil, say, thereby avoiding the need to rewrite b. But the i/-rewriter 
has relatively little support for deciding propositions (since it is context free and 
does not use the ACL2 type system or other decision procedures). 

To see why the “(if x y y)” heuristic so often wins, consider the origins 
of the problem. Here b and c are state transformations, the modeled machine is 
branching on a, and we are interested in determining the component of the 
new state. But most state transformations on the machines we have seen leave 
most state components unchanged. Thus, in many cases neither b nor c change 
the value of the z*^ component and our heuristic makes the superior choice. 
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In paragraph 3.1.2 we basically abandon the rewriting of (nth i (if a b 
c) ) if no nth-update-nth rule was applied while rewriting (nth i b) or (nth i 
c) . We prefer to keep the if inside the nth to avoid case splitting. To implement 
the test, the j^-rewriter returns a flag that indicates whether it used any rules. It 
is insufficient to test whether the rewritten facets are equal to their unrewritten 
versions since quite often b and c will have been replaced by their preferred facets 
(i.e., we may have opened function applications). 

Paragraph 3.2 is the case for which the algorithm was invented. It applies 
the nth-update-nth theorem. 

Paragraphs 3.3 and 3.4 deal with arrays in our setting and are not discussed 
here. 

We have optimized the algorithm in several ways. The most important is to 
use caching or memoization to avoid recomputing the i^-rewrite of a previously 
seen facet. In our implementation, we use a hash table with 64K entries, each of 
which is a ring containing (at most) the five most recently seen facets that hashed 
to that location and the results of the corresponding i/-rewrites. Even though 
we hit on a hash entry only approximately 6% of the time, we find that the 
savings is significant and, indeed, makes the difference between being practical 
or impractical on industrial-scale problems. 

Recall the tests in Section^ Consider the theorem there called b-phasel — 
phasel. Implementing the algorithm without caching gives rise to 10,236 calls 
of the i^-rewriter. With caching, that theorem generates 124 calls. Of those 124 
calls, 18 hit in the cache, giving a cache hit rate of 14%. Each hit, however, saves 
the algorithm from re-exploring a potentially large subtree. 

In practical applications, the cache is of supreme importance. For example, 
in a theorem taken from the proprietary Rockwell test suite, the cached version 
of the j^-rewriter was called 216,524 times. The cache hit rate was 6.2%. But 
without the cache the algorithm would require about 3 x 10^® callsj 

Because of our desire to cache the results, we have made the i^-rewriter com- 
pletely “context-free.” That is, it does not take any arguments that encode the 
hypotheses governing the current term, since to do so would mean that we would 
have to cache that contextual information and probably have to probe the cache 
to look for prior calls in weaker contexts rather than identical contexts. 

For a discussion of several elaborations of the algorithm, how it is used in 
ACL2’s rewriter, and some proposed improvements, see 

: / /www. cs .urexas . eau/ users/ moore/ puDiicap ions/ nu- rewrite: 



8 Related Work 

A term representation similar to our facets is provided by the “term module” 
of Hickey and Nogin’s modular theorem proving architecture Their notion 
of “delayed substitution” is motivated by the same considerations that led us to 

^ The number is 338,664,298,746,582,325,860,641,409. This is too large compute by 
the brute force method of eliminating the cache and counting calls. It was computed 
by using the cache to remember how much work was done for each entry. 
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introduce facets. Their framework is more general than ours; in particular, they 
provide utilities for fast tactic-based theorem proving. However, their approach 
to delayed substitution is, essentially, to use lambda applications to represent 
terms and to implement the operations of destructuring such terms without 
doing the substitution implied by beta reduction. Our facet data structure is 
more efficient for the operations we support. This is important when dealing 
with very deep lambda nests. 

Our notion of reconciliation, which is designed to generate a facet from a 
term-like structure containing facets, has no counterpart in their system because 
their “facets” are already terms. We can afford reconciliation because, as noted, 
about 98% of the time the facets to be reconciled all have the same stack. 

The architecture of Q does not provide caching, which we have found is 
crucial to good performance on large problems. 

Facets are suggestive but independent of “explicit substitution” logics 
Our view of facets is that they merely provide an efficient data structure for 
implementing certain simplification strategies in conventional logics. The idea of 
“nameless” substitutions might be usefully incorporated in future work. 



9 Conclusion 

Our algorithm is being tested under fire in industrial applications. We are still 
“tuning” our integration of the algorithm, focusing on tactics for using it and cer- 
tain low-level implementation details. Of particular interest are the management 
of the cache and the associated hashing function used to cache Lisp s-expressions. 
The algorithm sometimes generates unnecessarily large intermediate expressions 
as suggested by the b-phasel* series mentioned in Section^ We are working on 
preventing these explosions 

Nonetheless, the j^-rewriter has been extremely effective in the full-scale in- 
dustrial application for which it was developed for Rockwell Collins. It has been 
used in the proofs of hundreds of theorems that were previously well beyond the 
capability of ACL2 to simplify. We take this as a good sign but still regard this 
as a work in progress. 
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Abstract. In this paper we propose a methodology for verifying the se- 
quential consistency of caching algorithms. The scheme combines times- 
tamping and an auxiliary history table to construct a serial execution 
‘matching’ any given execution of the algorithm. We believe that this 
approach is applicable to an interesting class of sequentially consistent 
algorithms in which the buffering of cache updates allows stale values to 
be read from cache. We illustrate this methodology by verifying the high 
level specihcations of the lazy caching and ring algorithms. 



In shared memory multiprocessor systems a memory consistency model speci- 
fies how memory operations will appear to execute to the programmer. The closer 
the memory consistency model forces the shared memory to behave as a serial 
memory system - a system in which all operations are performed atomically di- 
rectly on memory with no buffering or caching (Figure ^ a)) - the easier it is for 
the programmer to write correct code for the system. However, the stricter the 
memory model the more hardware and compiler optimizations are disallowed. 
Sequential consistency is an intuitive memory model, in which, “the result of 
any execution is the same as if the [memory] operations of all the processors 
were executed in some sequential order, and the operations of each individual 
processor appear in this sequence in the order specified by the program” El- 
Sequential consistency is a relatively restrictive model when compared with the 
more relaxed memory models (such as partial or total store ordering, or release 
consistency) which are supported by some commercially available architectures 
(e.g. PowerPC, SPARC, Digital Alpha) p. 

Many sequentially consistent models implement coherence, an even stricter 
consistency model. Whereas an execution is sequentially consistent if all of the 
processors’ local views can be interleaved to form a single serial behavior, regard- 
less of the relative ordering of events at different processors, coherence requires 
that the events, as ordered globally, be a trace of serial memory |2|. 

To prove sequential consistency of a proposed memory implementation M 
it suffices to construct, for every aM, an execution of M, a matching serial 
execution such that all operations in read and write the same values as in 
(Tm- However, the creation of such a “witness” serial execution may require that a 

* Research supported in part by a grant from the German-Israel bi-national GIF foun- 
dation and a gift from Intel. 



G. Berry, H. Comon, and A. Finkel (Eds.): CAV 2001, LNCS 2102, pp. 423-^23 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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Fig. 1. Architecture of (a) a serial memory and (b) the lazy caching algorithm. 



potentially unbounded number of operations be re-ordered. In fact, the problem 
of verifying sequential consistency is known to be undecidable 0- Thus, unlike 
coherence which can often be verified quite easily, sequential consistency does not 
comfortably fit the pattern of standard refinement techniques (trace inclusion, 
bisimulation, testing preorder). The non-coherent lazy caching algorithm was 
therefore proposed by Rob Gerth as an example on which different refinement 
methods can be tried m, and in 1999 a special edition of Distributed Computing 
was devoted to this project H3|. 

In this paper we present a proof methodology which involves timestamping 
the cache reads and shared memory updates of an execution and placing them 
in a history table. Intuitively, every processor Pi has a cache Ci which contains a 
subset of the values in the shared memory at some time ti < to, where to is the 
global system time. All writes to memory occurring in the interval (tiCc] have 
not yet been applied to Ci. The local time ti is precisely the time at which the 
global memory had contents consistent with Ci . We timestamp instructions with 
the local time (and other information, in order to create a total ordering between 
instructions executing at the same local time) and place them in a history table 
ordered by timestamp. The information in the history table contains sufficient 
information for a matching serial execution to be built, and the algorithm to be 
proved sequentially consistent. 

We believe that this methodology is suitable for the verification of the se- 
quential consistency of many non-coherent memory models, as demonstrated by 
our applying this proof method, using the PVS |27j theorem prover, to two ex- 
amples, lazy caching and a ring algorithm ^[3. While this methodology 

is theoretically applicable to coherent snoopy protocols, we believe that it is 
more complicated than is required for such algorithms. Current work considers 
increasing the automation of deductive proofs, and we hope later to consider the 
application of the methodology to other classes of caching algorithms. 

The paper is structured as follows: In Section 0we describe the lazy caching 
algorithm. In Section Qwe explain how timestamping and the history table are 



^ The PVS files are available at ^I- 
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Fig. 2. Lazy Caching Transitions. 



used to derive a serial execution. In Section 0 we define the ring algorithm and 
describe how it fitted into our methodology. Section 0 discusses related works 
and in Section 0 we summarize our conclusions. 

1 Lazy Caching 

The “lazy cache algorithm” |2| is a sequentially consistent protocol in which cache 
updates can be postponed, and writes are buffered, allowing processors to access 
stale cache data. 

As illustrated in Figure 0b), the system consists of n processors. Pi, . . . , 
with each Pi owning a cache C, and FIFO in- and out-queues Ini and Outi^ 
respectively. We have further associated with each processor an unbounded in- 
struction list, containing instruction of the form “read a” and “write a, d”. 
Instructions in the instruction list are executed sequentially, with a program 
counter, pct, pointing to the next instruction. 

A processor Pi initiates a write event Wi by placing a record recording the 
instruction address and new value at the tail of Outi. When this record reaches 
the top of Outi it can be popped off and the memory write MW^ occurs. That 
is, the shared memory is updated, and a new record recording the address and 
value is placed in the m-queue Inj of all processors Pj. The copy placed in Ini 
is starred. When the entry at the head of lui is popped off a cache update CUi 
occurs, and Ci is updated with the value recorded in the lui entry. 

A read event R^ can be performed if the address a requested is in the cache, 
Outi is empty and lui does not contain any starred entries. The value read is 
that in the cache. We note that this value may differ from that in the memory if a 
write to a is buffered in lui. Locations (which are not currently in cache) can be 
brought into the cache by placing the memory value in the in queue in a memory 
read (mr^) action, and can be summarily evicted by cache invalidation (ci). 
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Fig. 3. (a) A partial execution of the lazy caching algorithm. All transitions refer 
to address a. Time increases from left to right, (b) A matching serial execution, 
where “read” and “write” instructions correspond to R and MW events. 



In our interleaving model at any step a processor can either initiate a read 
or write (if one is enabled), pop an entry off its in- or owt-queue if they are non- 
empty, initiate a cache update, invalidate a cache entry, or idle (i). The system 
is parameterized by the number of processors and there is no restriction on the 
maximum size of the queues, the address space, or the set of memory values. Our 
model, summarized in Figure El very closely resembles that of Gerth jIS|. The 
reader is referred to this paper, or our PVS source files 0, for more information. 



An Example Execution Fragment. In Figure El a) we consider a very small 
execution sequence which illustrates the non-coherent nature of the lazy caching 
algorithm. We assume that address a has initial value 0. Process P\ initiates a 
write of 6 to a, placing the tuple (o, 6) on its owt-queue. Process P2 then initiates 
a write of 8 to a. Process P2 pops (a, 8) off Out2, in a memory write MW2 action, 
pushing the (address, data) tuple onto the in-queues of all processors. Sometime 
thereafter action MWi also occurs. Process P3 reads the value of 0 for a, updates 
its cache with 8, and then reads 8 as the value of a, while the write of 6 is 
buffered. Process P4 updates its cache with both values before reading reading 
a as 6; process P5 reads a as 0. 

We note that the memory is updated in the opposite order to which the writes 
were initiated, and thus a has the final value of 6. Furthermore, processors P3 
and P5 read stale values for a after P4 has read the new value. 



2 Creating a Serial Execution 

To prove an algorithm sequentially consistent we show that each of its execu- 
tions has an equivalent serial execution. In the serial execution all operations 
are executed directly on memory, in some sequential order, and the operations 
of each individual processor are in program order, where “read” and “write” 
instructions correspond to R and MW events. It is shown that reads in the two 
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executions return the same value, and the final memory values are identical. Fig- 
ure BKb) gives the serial execution corresponding to the lazy caching execution 
of Figure EJa). 

2.1 Logical Time 

Each processor has a view of memory which is consistent with the values memory 
had at some time in the past: It sees the memory as it was before it was modified 
by the last x writes, these being the writes which are buffered in the in-queue. 

The global time to is determined by an auxiliary global clock, and is initially 
zero. Every time a memory write occurs the global time is incremented by one. 

Each processor has an auxiliary local clock which counts the number of writes 
which have been applied to its cache. This clock gives its local time. It is updated 
each time a process performs a cache update which was initiated by a memory 
write. These cache updates are termed countable. (In order to distinguish count- 
able cache updates from those initiated by memory reads, we add an auxiliary 
processor id field to in-queue records. An entry is the result of a memory read 
exactly if the processor id in the record is that of the processor and the record is 
not starred.) The processor has a view of memory consistent with the values that 
memory held when the global time was the current local time of the processor. 

Every read (r) or memory write (mw) event in the system is given a unique 
timestamp when it occurs. The timestamp is a tuple (t, r, id), where t is the local 
time at which the event occurs, r is the numbers of reads which this processor 
has performed since the last counted cached update, and id is the identifier of 
the processor that initiated the read/write. On a read Ri(a, d) we add to the 
history table H an entry Ri(a, d), its timestamp (ti,ri -I- l,f) and the current 
program counter, pci of Pi. The local read counter, r,, is incremented by 1. On 
a memory write MWj(a, d) we add to the history table H an entry MWj(a, d), its 
timestamp (tc + 1,0, j) and pcj and we set tq := tc + 1. On a counted cache 
update CUk we set t^ := t^ + 1, ru '■= 0. 

The timestamps induce a strict order on memory events: 

(ti, ri, fdi) ^ (O, U2, id2) < ^2 V = f2 A (ri < r2 V ri = r2 A id\ < id2) 

Time 0 is the time given to all reads of the initial, unmodified memory. For 
every U > 0 the “smallest” timestamp with time ti will always be a memory 
write (mw), as the reads field of a timestamp is zero exactly when it represents 
a memory write operation. Since the local clocks are incremented every time 
that a cache update is performed, there is only one memory write at time ti and 
all other operations timestamped with t = ti are reads. As they are all reads 
from the same memory, with no intervening writes, they will return the same 
value irrespective of the ordering between them. However, it is desirable that the 
program order of each processor be maintained, and this is done by the reads 
field of the timestamp. The id field of the timestamp is used to order operations 
at the same local time by different processors. The relative ordering of these 
operations is unimportant, and ours in one of a number of possibilities. 

These counters and timestamps are variants of Lamport clocks . However, 
in our system each processor updates its clock independently, without reading 
the timestamps on incoming messages. 
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Fig. 4. An execution of the lazy caching algorithm with history table and match- 
ing serial execution, (a) Building the history table, (b) The history table ordered 
by timestamp, (c) A serial execution. 



2.2 Extracting a Serial Execution from the History Table 

The history table is an ordered list of entries sorted in non-decreasing order 
of timestamp. Since memory writes always have a greater timestamp than any 
other elements in the table at the time they occur they are appended to its end. 
Reads, however, may be inserted in the middle of the history table. The function 
size{H) returns the number of entries in H . For every x < size{H), H[x] refers 
to the x’th entry of H. 

In Figure El^a) we revisit the example of Section Q showing how the history 
table would be constructed. For each processor the table records its local time t, 
the value it stores for a, and r, the number of reads it has performed since the 
last countable cache update. The timestamp column indicates the timestamp of 
the entry which is added to the history table at the step in which it is added. 
Time progresses from top to bottom in the table. 

A serial execution can be derived from the history table such that the i’th 
entry in the history table corresponds to the i’th operation in the serial execution. 
It is proved that in this serial execution every processor issues its instructions 
in the same order as in the original execution, all reads return the same values 
as in the lazy caching execution, and the final memory values are the same as 
in the original execution. 
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In Figure^Ib) we present the history table built in the example of Figure^JJ^a), 
with entries ordered by timestamp. The table illustrates all the fields in the 
history table. Figure0(c) illustrates the serial execution which is derived. 

2.3 The Proof 

The auxiliary history {H) list and memHist array and readValues arrays are in- 
trinsic to the presented proof. Each processor has a readValues array which maps 
instruction indices to values. Every time a read operation occurs the value read 
is stored in the relevant entry of the readValues array. This array is later used to 
insure that the lazy caching and serial executions return identical values for every 
read. The memHist array is a history of memory contents, where memHist[t] 
is a copy of the shared memory at global time t. In addition, memHist also 
stores for every time t the processor id and program counter for the instruction 
that updated memory from memHist[t — 1] to memHist[t]. We also found it 
useful to add auxiliary fields to the in- and oitf-queue entries: in addition to the 
address, value and fields, we added auxiliary fields recording the processor 
id and program counter of the related instruction, and the global time at which 
the related event occurs. We note that this time field is not used to update the 
processors local clocks, or any other variables. Some of the data structures are 
detailed in Figure 0 

In order to construct the serial execution we prove a one to one relationship 
between executed operations and history table entries. The bulk of the proof ef- 
fort involved manually defining properties of the lazy caching algorithm and then 
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proving their invariance in the PVS^J) system. We list some of the invariants 
used in the proof. 

For every two entries H\x] and H\y] of history table H with timestamps 
(tx,rx,idx) and (ty,ry,idy) respectively, and x,y < size{H): 

— lix ^ y then (tx, Tx, idx) ^ {ty, idy). (Distinct entries have distinct times- 
tamps) . 

— X < y iS (tx,rx,idx) A (ty,ry,idy). {H is ordered by timestamp). 

— Entry H[x\ corresponds to a memory write operation iff = 0. 

— li tx = ty and = 0 then Xy ^ 0. (At most one memory write at any global 
time). 

— For all 0 < t < tc there is an index z < size{H) such that H[z] is times- 
tamped (t, 0, id) for some id. (Every time period greater than zero is initiated 
by a memory write) . 

— For all 0 < r < Xx, there is an entry H[z], z < x timestamped (tx,x,idx) in 
H. (Reads are counted sequentially, with no gaps in the counting). 

— The time tx is not greater than the global time tc and if tx is greater than 
the local time tid^ then there is an entry in Ixiid^ corresponding to H[x\. 

— The contents of memHist for the current global time equal the current 
memory. That is, memHist[tG] = Mexn. 

— For every address a and processor Pi with cache Ci and local time ti, 
Ci{a).valid —>■ Ci{a).data = memHist[ti\{a). The values of locations in the 
cache match the xnexnHist values for the processor’s local time. 

— For every occupied entry Ini[k] of Irii, ti < Ini[k].t < to and if U = Irii[k].t 
then Ixii[k] records a non-countable cache update. Intuitively, for every t 
such that ti < t < to there is an /ui-entry which will be used to update ti. 

— The program counter H[x].pc is less than pcid^. 

— For every value pc less than the program counter pci of Pi either there is an 
entry H[z],z < size{H) with timestamp {tz,Xz,i) such that Pl[z].pc = pc, 
or there is an entry of Outi corresponding to this instruction. 

— The value Pid„,.xeadValues[P[[x].pc] = memHist[tx]{a) where a is the ad- 
dress in the pc’th instruction of Pid^,- (The values in the xeadV alues array 
match the memHist values for the time of the transition.) 

The serial execution is inductively built in a list S where S[x].mem and 
S[x\.pxocs give the global memory and processor states in the serial system 
after x execution steps. Intuitively, the x’th entry of S corresponds to the a:’th 
entry of H , for all x < size{H). That is, in the serial execution transitions occur 
in the order in which they appear in the history table. 

We now define predicate a which describes the relationship between the lazy 
caching data structures L and S. For clarity we prefix data structures in the lazy 
caching algorithm with L where confusion could arise. 

1. The first entry, S'[0], fulfills the initial conditions of the serial system. 

2. For every 0 < a; < size[H], Pseriai{S[x], S'[a;-|-1]). That is, there is atransition 
in the serial system from S' [a;] to S[a; -I- 1]. 

3. For every 0 < a; < size[H], S[a;].mem = L.memHist[H[x\.t\. That is, the 
global memory at the x’th entry in S matches the memory recorded in 
L.memHist for time H[x\.t. 
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Fig. 6. An Example Configuration in the Ring Algorithm. 



4. For every processor Pi and program index p, if the p’th instruction of Pi 
is a read instruction then S[size[H]].readValues[i,p] = L.readValues[i,p\. 
That is, every read in the two systems returns the same value. 

5. The program counter of processor Pi at the end of the sequential execu- 
tion, S[size{H)].pci^ is equal to L.pci if L.OuU is empty, and the (auxiliary) 
program counter field in the top L.Outi entry, otherwise. 

We prove inductively that for every reachable lazy caching state L there is an 
S such that a(L, S)\ We first prove that predicate a holds for the initial states 
of the two systems, and then that if a(L, S) holds, then for any L' such that 
Piazy{L, L') is a lazy caching transition, we can build an S' such that a(L', S'). 

From parts (1) and (2) of a S records a legal serial execution. Given that 
L.memHist[tG] is proved to equal L.Mem, the currently lazy caching memory, 
from (3) we can deduce that the memory values in the two systems agree. From 
(4) we prove that both systems return the same value for every read. 

We complete the proof by showing that the lazy caching system can always 
progress meaningfully. 



3 The Ring Algorithm 

In order to test the applicability of our methodology we applied it also to a 
model based on Collier’s ring algorithm [0|: 

Processors Pq, . . . , Pn-i are connected in a ring, with Pi sending messages 
only to its successor, Pi+imodn- The channels between every two successive pro- 
cessors are FIFO queues of messages. Processor Pq is designated the supervisor. 
If processor Pi,i ^ Q wants to perform a write of value v to address a it sends to 
its successor a WriteRequest(a, v) message and enters a waiting state. This write 
request is passed around the ring until it reaches the supervisor. The supervisor 
updates memory with this address and value, and then sends a WriteReturn(a, 
v) message. On receiving a WriteReturn message all processors update their 
caches, and then pass it on to their successor. Process Pi also releases itself from 
its waiting state and can proceed. When the write return reaches the supervisor, 
it is removed from the system. 

A processor can execute a read instruction if the address is in its cache. Oth- 
erwise it sends a ReadRequest, which the supervisor answers with a ReadRetum. 
After thus bringing the address into the cache, the read can be executed. 
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The supervisor accesses memory directly (its local cache is the “shared mem- 
ory”) and never issues ReadRequest or WriteRequest messages. On performing a 
write it sends a WriteReturn message so that all other caches can be updated. 

This model fits neatly into our framework. As in the lazy caching example, 
cache reads and updates to the shared memory are entered into the history 
table when they occur. (In this algorithm the memory update occurs when a 
WriteReturn in initiated by the supervisor.) The supervisor increments its local 
clock when it sends a WriteReturn, and all other processors increment their 
local clocks on receiving the WriteReturn. The local time of the supervisor is the 
global system time. The local time of Pi is the global time minus the number 
of writeReturns on channels between Pq and Pi. An example configuration is 
given in Figure El 

4 Related Works 

Various methodologies, ranging from CSP m, to abstraction HE| and model 
checking m have been used to verify lazy caching. The primary difficulty in 
verifying lazy caching seems to be that at the time that a memory is updated 
by a write in the lazy caching system, it is not known how many reads reading 
the stale value will still occur. That is, nondeterministic choices in the abstract 
(serial) system occur earlier than in the concrete (lazy caching) system. One 
solution is to input the computation of the concrete system into a transducer, 
which queues segments of the concrete computation until they can be matched 
with an abstract execution m- Similarly, m propose a finite state observer 
that observes and re-orders the memory operations, while P2] use an auxiliary 
queue to record writes which have updated memory but have not yet updated the 
cache. Step-wise refinement, in which the lazy caching system is transformed in a 
number of steps to a serial system, is used in |SI and Composition 1201 and 
abstraction [I I are two other methodologies proposed, while in j2| decomposition 
is coupled with the use of CSP to prove trace inclusion. 

The paper introducing lazy caching P| presents a semantic proof that it is 
sequentially consistent. A WriteCounter is used to assign a sequence numbers 
to updates of the shared memory. Reads are assigned numbers according to the 
last write which the processor has popped off its in-queue. An auxiliary Hist 
variable is used, with semantics similar to that of our memHist variable. 

Of the above mentioned verification efforts only m has been mechanized 
at all. The model-checking verification in m is of a restricted system in which 
there is no oitf-queue and the in-queue is of size at most one. Given the problems 
of state explosion, it is unclear how a more detailed system could be verified. It 
is claimed that the type of abstractions that are used in m could be computed 
algorithmically, thus partially mechanizing this proof. 

Timestamping, using variants of logical Lamport clocks 1231 , has been used 
to verify various memory consistency models PEI The algorithms are verified 
at a lower level than we have considered, including message passing protocols. 
Timestamping is used to divide logical time into coherence epochs, intervals of 
logical time in which a node has read-only or read-write access to a block of data. 
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Thus, it is possible for one epoch to contain multiple, or no, stores. Furthermore 
the same write can be given different timestamps when it is used to update dif- 
ferent caches. In contrast, in our timestamping each memory update is identified 
with an epoch and has a unique timestamp. This underscores a difference in our 
approaches to memory consistency - whether block control or memory contents 
are the primary concern. The difference in emphasis is appropriate given the dif- 
ferent levels (high level versus message passing) at which verification occurs, and 
the different algorithms considered. The proofs presented are entirely manual. 

Theorem proving has been used by Park and Dill II 111 21 and Stoy et al 
to verify cache coherence protocols at the message passing level. Park and Dill 
aggregate the steps of each transaction in the implementation into a single atomic 
transition in the specification. A eommit point is identified, for each transition, 
and the aggregation function intuitively is a function completing committed 
instructions. This methodology has been used to effectively verify a detailed 
model of the complex FLASH protocol. However, it is unclear how it could 
be used in our examples, where instructions may commit out of order (a read 
instruction may return an older value than a previous read, by another processor, 
for the same address). In |2B| a PVS implementation of Lamport’s TLA 
is used. Queues are drained to empty them of messages, and an abstraction 
function used to show refinement between two protocols. 

A lot of research has been done on using model checking to verify cache 
coherence protocols. However, due to the difficulties of verifying large systems 
many of these methodologies are restricted. E.g., the ‘test model-checking’ of HH 
in incomplete, the work by Delzanno, Pong and Dubois PIT^ based on FSMs is 
only appropriate to coherent algorithms. Lazic m shows that data independence 
theorems can be used to make model checking of cache protocols more tractable. 

Our construction of a serial execution is reminiscent of work by Glusman 
and Katz m- They allow independent operations to be re-ordered to create 
a convenient computation. Our “convenient” serial execution is not only a re- 
ordering of the events, but also a change in the nature of the occurring events. 

There are more points of similarity between our work and those mentioned 
above. The auxiliary variables in parr?] perform some of the functions of our 
history table. While timestamping has been used previously in verifying cache 
consistency protocols |S|, the similarities between this work and ours are in 
the terminology more than the semantics. Our timestamping is closer in mean- 
ing to the WriteCounter variable in |2j. Their Hist variable is also similar to 
our memHist variable. However, the proof in |3 is ‘on a semantical level and 
not grounded in a refinement methodology ’US!. By creating a full timestamp- 
ing scheme, and using a history table, we have developed a formal verification 
framework which allows mechanical verification, and can easily be applied to 
different verification problems. 

The centrality of the history table, and the method in which it is coupled with 
timestamping is new, and provides a relatively simple proof which is amenable to 
mechanical verification. We believe that mechanical verification provides a higher 
degree of confidence than pen and paper proofs, and testifies to a relatively simple 
and natural methodology. 
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5 Conclusion 

In this paper we present a refinement methodology for the verification of sequen- 
tial consistency. Given that the general problem is known to be undecidable, our 
proof method cannot be complete. However, we believe that there is a class 
of ‘difficult’, non-coherent algorithms, to which this methodology is suited, as 
illustrated by the successful verification of the lazy caching and ring algorithms. 

We take cache reads and shared memory updates to be the important events 
to be recorded, and show that a correct ordering of these events allow the con- 
struction of a matching serial execution. While the idea of using timestamps (or, 
more generally, Lamport clocks) to order events is far from new, the timestamp- 
ing that we have devised is particularly well suited to sequential consistency. It 
allows us to give a relative order (timestamp) to an “important event” , when it 
occurs, relative to all past and possible future such events in the system. The 
history table provides a means of dynamically ordering these events, so that a 
serial execution can be extracted. 

The methodology is sound - when it is applied a corresponding serial execu- 
tion can be built. Since all steps are mechanically verified in the PVS theorem 
prover, this gives a very solid proof of sequential consistency. 

The major drawback of this methodology is the large amount of human ef- 
fort required (several person- weeks), devoted primarily to deriving the invaraint 
properties and directing the theorem prover. We are currently researching tech- 
niques to increase the automation of the proofs, and hope later to consider the 
extension of our methodology to other classes of algorithms. 
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Abstract. The usefulness of Bounded Model Checking (BMC) based on 
propositional satisfiability (SAT) methods for bug hunting has already been proven 
in several recent work. In this paper, we present two industrial strength systems 
performing BMC for both verification and falsification. The first is Thunder, which 
performs BMC on top of a new satisfiability solver, SIMO. The second is Forecast, 
which performs BMC on top of a BDD package. SIMO is based on the Davis 
Logemann Loveland procedure (DLL) and features the most recent search methods. 

It enjoys static and dynamic branching heuristics, advanced back-jumping and 
learning techniques. SIMO also includes new heuristics that are specially tuned for 
the BMC problem domain. With Thunder we have achieved impressive capacity and 
productivity for BMC. Real designs, taken from Intel’s Pentiumo4, with over 1000 
model variables were validated using the default tool settings and without manual 
tuning. In Forecast, we present several alternatives for adapting BDD-based model 
checking for BMC. We have conducted comparison of Thunder and Forecast on a 
large set of real and complex designs and on almost all of them Thunder has 
demonstrated clear win over Forecast in two important aspects: capacity and 
productivity. 

1 Introduction 

The success of formal verification is no longer measured in its ability to verify 
interesting design behaviors; it is measured in its contribution to the correctness of 
the design in comparison to the contribution of other validation methods, i.e., 
simulation. Therefore, technologies and methodologies that enhance the productivity 
of formal verification are of special interest. Our research identifies Bounded Model 
Checking (BMC) based on propositional satisfiability (SAT) to be such a technology. 

BMC based on SAT methods [bcrz99, bccz99, shtOO] has recently been 
introduced as a complementary technique to BDD-based Symbolic Model Checking. 
The basic idea is to search for a counterexample in executions whose length is 
bounded by some integer k. Given this bound, the model checking problem can be 
efficiently reduced to a SAT problem, and can therefore be solved by SAT methods 
rather than BDDs. 
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In this paper, we report our detailed evaluation of SAT -based BMC at an 
industrial setting. Our initial interest in BMC and SAT technology has been due to 
the several recent papers [bcrz99, bccz99, shtOO] that have compared BDD-based 
model checking to SAT-based model checking and have concluded that many of the 
(BDD-based model checking) hard problems can easily be solved by SAT-based 
model checkers. The test cases used in the comparisons reported in [shtOO] were 
drawn from the internal benchmark of a state-of-the-art BDD based symbolic model 
checker, RuleBase [bee96a, bee97a]. Therefore, in [shtOO], no definite conclusions 
could be derived on the capacity benefit of the SAT technology, since all the 
verification cases were in the capacity ballpark of RuleBase. Although Biere et al. 
report in [bcrz99] that their SAT-based BMC consistently outperformed the BDD- 
based symbolic model checker, SMV, the results that they convey are on verification 
test cases made up of hundreds of sequential elements and inputs well in the capacity 
range of BDD-based symbolic model checkers. 

Furthermore, prior comparisons [shtOO] leave open the question whether the 
difference in performance and capacity is due to the underlying technology— BDD 
versus SAT, or is due to the difference between bounded and unbounded model 
checking. Moreover, both in [bcrz99, shtOO] no extensive expert configuration and 
tuning have been done in the extraction of the performance numbers for BDD-based 
model checkers in their comparison with tuned SAT-based bounded model checkers. 

In order to understand the clear benefit of bounded model checking and SAT 
technology at a formal-verification setting, we undertook the task of developing 
industrial strength BMC using both BDD and SAT algorithms and have thus 
provided the means for a fair comparison. On one hand, we have optimized Intel’s 
unbounded BDD-based model checker. Forecast, for bounded model checking. On 
the other hand, we have developed a state-of-the-art SAT-based bounded model 
checker. Thunder. 

Since our interest in SAT technology was in addressing the productivity problem 
of the current formal verification techniques, we have evaluated the benefits of BDD- 
based and SAT-based bounded model checking with respect to productivity. We have 
built a performance benchmark made up of a large number of hard real-life 
falsification test cases chosen from the unbounded Forecast’s internal benchmark 
base. For each problem, we have built a falsification version that results in a 
counterexample of minimal length k, and a verification version of length k-1. In this 
manner, we have evaluated the power of SAT based bounded model checking for 
both verification and falsification. 

In order to understand the benefits of SAT technology with respect to 
productivity, we tuned both Thunder and Forecast for the domain of bounded model 
checking and came up with a default best configuration for both engines. Since it is 
very hard to measure the tuning effort, we have compared tuned and default Forecast 
versus default Thunder. Surprisingly the default and best setting for Thunder was the 
same for all the test cases in the benchmark. Although Thunder significantly 
outperformed untuned Forecast; its performance was very similar to tuned Forecast 
for almost all the cases except for a few cases that could not be verified by any 
setting of Thunder. The performance benchmark therefore showed a clear 
productivity gain achieved by Thunder in the drastic reduction of user ingenuity and 
tuning effort in running the tools. 
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The capacity benchmark that we extracted by eliminating the pruning directives on 
all the test cases of the performance benchmark demonstrated that Thunder with no 
pruning effort could verify most of the test cases. These benchmarks, corresponding 
to circuits with thousands of sequential elements and inputs, are far beyond the 
capacity of Forecast and of any other BDD-based symbolic model checker. 
Therefore, the conclusion from the capacity benchmark was that Thunder has 
impressive capacity (can verify designs with over thousands of inputs and sequential 
elements) and potentially increases the productivity of the verification engineer by 
reducing the pruning effort significantly. 

Thunder reads in RTL models, e.g., written in Verilog or VHDL, and in addition 
a set of assumptions and assertions expressed in our new temporal specification 
language, ForSpec [armOl]. Thunder is compatible with a wide-range of recently 
developed, state-of-the-art SAT solvers (e.g., GRASP, SATO, Prover). We report the 
benchmark results of Thunder based on a new SAT solver SIMO, developed at the 
University of Genova. SIMO is based on the Davis-Logemann-Loveland procedure 
(DLL) [dll62]. Similar to other state-of-the-art DLL-based algorithms, SIMO’s 
strength is based on: (1) advanced procedures for choosing the next variable on 
which to split the search and (2) advanced backtracking mechanisms. SIMO features 
various forms of backtracking. In particular, besides the standard backtrack to the last 
choice point, SIMO implements a Conflict-directed BackJumping schema, CBJ, and 
CBJ-with-Learning [dec90a, pro93a, bs97a]. CBJ-with-Learning algorithm was 
chosen to be the best setting following intensive benchmarking with real-life test 
cases. In the context of heuristics to choose the splitting variable, we evaluated a 
wide range of known dynamic heuristics, both greedy (e.g., MOMS) and Boolean 
Constraint Propagation (BCP) [fre95a] based (e.g.. Unit), and introduced a new 
dynamic heuristics, UniRel2, that proves to be the best for the Intel bounded-model 
checking benchmark. Unirel2 is a domain specific heuristics, since it gives preference 
to model variables, and also takes into account the simplification imposed on the 
auxiliary variables. Previous evaluation [shtOO] of dynamic splitting heuristics 
reported static heuristics to be a clear winner over dynamic heuristics. Our results 
are not compatible with [shtOO] in the sense that for our benchmark the dynamic 
splitting heuristics, Unirel2, worked much better than the available static heuristics 
in SIMO. Since we have not evaluated UnireI2 versus the original static heuristics 
introduced in [shtOO], our conclusion is that dynamic splitting heuristics tuned for the 
domain of bounded model checking as is Unirel2 can be very robust for industrial 
size designs. Our intensive evaluation clearly pinpointed Unirel2 and CBJ-with- 
Learning as the winning setting of Thunder for Intel’s benchmark. 

Our BDD-based model checker. Forecast, is built on top of a powerful BDD 
package, and contains most of the recently published state-of-the-art algorithms for 
symbolic model checking. In addition to the unbounded model checking algorithms 
in Forecast we developed bounded ones in order to give BDD based BMC a fair 
chance in the comparisons against Thunder. We tried to get an automatic (not 
requiring additional human tuning) default setting for Forecast as we have done for 
Thunder. We were not able to get a default setting that is good for all the test cases 
and an automatic static BDD variable ordering that beats the best humanly tuned 
variable order. Therefore, we compare both best default setting and tuned setting for 
Forecast with default setting of Thunder. The comparison reveals the productivity 
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boost gained by Thunder, since the default setting of Thunder clearly outperforms the 
default setting of Forecast and is very competitive with the tuned Forecast setting. 

As a summary, the unique contribution of this work is in the adaptation of 
unbounded BDD-based model checking to hounded model checking, optimizations 
of SAT based methods (mainly dynamic splitting heuristics) for bounded model 
checking and a thorough and fair evaluation of hounded model checking on SAT 
versus BDD based model checking making use of a rich set of real-life complex 
verification and falsification test cases. 

The paper is organized as follows. In Section 2, we give an overview on Thunder 
and present experimental results that demonstrate the best SIMO and CNF generator 
configuration for Thunder. Section 3 describes our effort to achieve best results for 
BMC on BDD. In Section 4 we present experimental results comparing Thunder with 
Forecast. Section 5 describes our conclusion and future research directions. 



2 Thunder: Bounded Model Checker on SAT 

Thunder, our hounded checker on SAT technology, resembles the work of Bierre et 
al. [bccz99] in the reduction of the symbolic model checking problem to a bounded 
model checking problem and consequently to the problem of propositional 
satisfiability. Thunder, which makes use of a powerful DLL-based engine, SIMO, as 
its default SAT engine, is also compatible with other state-of-the-art SAT engines 
such as GRASP, SATO, Prover Plug-In™ [PPI, sta89]. We report in this paper our 
experience of Thunder with SIMO since our contribution is mainly in the tuning of 
DLL-based algorithms in the context of bounded-model checking. 

2.1 Transforming the Bounded Model Checking Problem to Formulas 

The basic idea in SAT based bounded model checking is to consider only paths of 
bounded length k and to construct a propositional formula that is satisfiable iff there 
is a counterexample of length k. BMC is concerned with finding counterexamples of 
limited length k, and thus it targets falsification and partial verification rather than 
full verification. 

In order to fully verify a property one needs to look for longer and longer 
counterexamples by incrementing the bound k, until reaching the diameter of the 
finite state machine [bccz99]. However, the diameter might be very large in some 
examples, and there is no easy way to compute it in advance. This issue is addressed 
in [sssOO] which incorporates induction in BMC that allows the algorithm to be used 
both for verification and falsification. 

Assume that we have a finite state machine M with initial states I and transition 
relation TR, where both I and TR are encoded symbolically as Boolean formulas. 
Assume also, that we want to check if an invariance property P holds for all states 
reachable in a bounded number of steps. It is sufficient to focus only on invariance 
properties since the safety specifications expressed in our temporal language, 
ForSpec, are compiled into such invariance properties. 

Our experience shows that the performance and capacity of Thunder is very 
dependent on the way we generate the propositional formulae describing the 
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counterexample. Similarly to CMU’s implementation of BMC, Thunder provides 
different settings that we describe below. We also provide experimental results that 
compare the various settings. 

The propositional formula describing a path from Sq to Sj^ requires Sq to be an initial 
state and also that there is a transition from Si to S;+i for 0 i < k: 

Path(so,...,Sk) = I(so) TR(so, si) ... TR(Sk.i, ) 

Thunder implements three different checks for a counterexample (similar to what is 
provided in CMU’s BMC tool). The first one, referred to as bound k, looks for a 
violation of P in all the cycles from 0 to k: 

Path(so,...,Sk) ( P(so) ... P(Sk)) 

The second check, referred to as exact k, looks for a violation of P exactly in the last 
cycle k: 

Path(so,...,Sk) P(Sk) 

Finally, the third check, referred to as exact-assume k, looks for a violation of P at 
cycle k and assumes P to be true in all the cycles from 0 to k-1: 

Path(so,...,Sk) P(so) ... P(Sk-i) P(Sk) 

As expected, using exact or exact-assume is significantly faster than bound, but then 
they solve an easier problem. For the sake of a fair comparison with BDD model 
checking, all the results in this section are obtained with bound. We will return in 
section 5 to the exact and exact-assume checks, since they are the only ones who can 
cope with the capacity challenging examples presented there. 

We also implemented the Bounded Cone of Influence (BCOI) optimization 
proposed in [bcrz99]. This optimization rarely negatively affects so we use it as a 
default, such that all the results below are obtained in the presence of BCOI. Our 
experiments used a DLL-based SAT solver, SIMO [tacOO], described in the next 
section. 



2.2 DLL Based Satisfiability Engine - SIMO 

As many other modern SAT solvers, SIMO [tacOO] is based on the well-known 
Davis-Logemann-Loveland (DLL) algorithm [dll62]. DLL assumes the propositional 
formula to be in Conjunctive Normal Form (CNF) and it employs a backtracking 
search. At each node of the search tree, DLL assigns a Boolean value to one of the 
variables that are not resolved yet. The search continues in the corresponding sub-tree 
after propagating the effects of the newly assigned variable, using Boolean Constraint 
Propagation (BCP) [fre95a]. BCP is based on iterative application of the unit clause 
rule. The procedure backtracks once a clause is found to be unsatisfiable, until either 
a satisfying assignment is found or the search tree is fully explored. The last case 
implies that the formula is unsatisfiable. 

SlMO’s strength is based on: (1) advanced backtracking mechanisms (2) advanced 
procedures for choosing the next variable on which to split the search. Besides the 
standard backtracking to the last choice point, SIMO implements also Conflict- 
directed Back-Jumping (CBJ) and CBJ-with-Learning [dec90a, pro93a, bs97a]. In 
Section 3.2.1, we explain at a high-level the CBJ-with-Learning algorithm which was 
chosen to be the best setting following intensive benchmarking with real-life test 



cases. 
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In the context of heuristics to choose the splitting variable, we compare several 
dynamic heuristics and introduce a new dynamic heuristics, UniRel2, that proves to 
be the best for the Intel bounded-model checking benchmark. Section 3.2.2 explains 
at a high level the heuristics that have been compared and the experimental results 
that justify our decision. 

2.2.1 CBJ-with-Learning 

Since the basic DLL algorithm relies on simple chronological backtracking, and most 
heuristics are targeted to select the literal that satisfies the largest number of clauses, 
it is not infrequent for DLL implementations to get stuck in possibly large sub-trees 
whose leaves are all dead-ends. This phenomenon occurs when some selection 
performed way up in the search tree is responsible for the constraints to be violated. 
The solution, borrowed from constraint network solving [dec92], is to jump back 
over the selections that were not at the root of the conflict, whenever one is found. 
The corresponding technique is widely known as Conflict-directed Back-Jumping 
(CBJ) [pro93]. It has been reported from the authors of RELSAT [bs97], GRASP 
[ss96] and SATO [zha97] that CBJ proved a very effective technique to deal with 
real-world instances. 

It turns out that in all these solvers, CBJ is tightly coupled with another technique, 
called Learning. CBJ can be very effective in "shaking" the solver from a sub-tree 
whose leafs are all dead ends, but since the cause of the conflict is discarded as soon 
as it gets mended, the solver may get repeatedly stuck in such local minima. To 
escape this pattern, some sort of global knowledge is needed: the causes of the 
conflicts may be stored to avoid repeating the same mistake over and over again. This 
process is usually called no-good or recursive learning. Our BMC experience with 
SIMO agrees with previous work [bs97] that reports that CBJ with relevance learning 
is essential for good performance in the domain of SAT. 

2.2.2 Splitting Heuristics 

The splitting heuristic needs to decide which variable to assign next from the set S of 
variables that were not assigned yet. Since the conversion to CNF [pg86] introduces 
many additional variables (one for each non-atomic sub-formula of the original 
formula) we restrict the set S to the variables of the original formula, also called 
relevant variables. As pointed out in [shtOO], this optimization is very useful and our 
results confirm this conclusion. 

SIMO features a static splitting heuristic that relies on a user-supplied order to 
choose each splitting variable among relevant variables. Additionally, SIMO has a 
wide range of dynamic splitting heuristics that showed to be very effective in our 
experience with bounded model checking. 

SIMO's dynamic splitting heuristics fall broadly into two categories: BCP 
heuristics, and greedy heuristics. BCP heuristics choose the splitting variable by 
tentatively assigning truth-values to (some of) the unassigned variables and then 
performing BCP. In this way the exact amount of simplification produced by each 
possible assignment can be calculated. Moreover, BCP heuristics can detect failed 
literals, i.e., literals that once assigned produce a contradiction after a single sweep of 
BCP. Greedy heuristics choose the splitting variable by estimating the amount of 
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simplification caused by an assignment. Relying on an estimate rather than an exact 
calculation makes greedy heuristics faster than BCP heuristics, hut also less precise 
and incapable of detecting failed literals. In this regard, greedy heuristics can be seen 
as an approximation to the BCP ones. Both types of heuristics branch on the variable 
that produces- or is estimated to produce- the maximum simplification in the formula. 
We used heuristics from both categories in our experiments with SIMO. 

Among the greedy heuristics, we have used Moms and Morel heuristics. For each 
open variable p, Moms computes the number of binary clauses in which p occurs, 
and uses this quantity as the expected amount of simplification when assigning p. 
Morel works in the same way as Moms, but its choice is restricted to relevant 
variables only. 

From the class of BCP heuristics, we have used three BCP heuristics, called Unit, 
Unirel, and Unirel2. For each open variable p. Unit tentatively assigns both p and 
p: for both choices, BCP is performed and the number of unit-propagated variables 
is collected. If the heuristic yields a contradiction by assigning p (resp. p) then it 
immediately assigns p (resp. p): if also p (resp. p) fails, then Unit halts and 
backtracks, otherwise it goes on in trying to select a variable. If all variables are 
assigned during this process or all the clauses are satisfied. Unit reports that a 
satisfying assignment was found. Unirel works in the same way as Unit, except it 
considers only relevant variables when collecting the number of unit-propagated 
variables. Unirell, on the other hand, tentatively assigns only relevant variables, but 
it collects the number of all the unit-propagated variables. 

We compared the performance of Moms, Morel, Unit, Unirel, Unirel2, and Static 
heuristics in SIMO making use of a benchmark of 26 real-life test cases. The 
benchmark is evenly distributed between falsification and verification test cases. 
Unirel2 heuristics provides a clear performance and capacity boost over the other 
heuristics. We chose to report only timings of the dynamic heuristics, since SIMO 
does not include all the known static heuristics. The current static heuristics in SIMO 
performed much worse than the dynamic heuristics for our benchmark. However, in 
order to derive any accountable conclusions on the effectiveness of dynamic 
heuristics versus static heuristics, SIMO needs to be enriched with the latest static 
heuristics for bounded model checking [shtOO]. 

For all the runs reported in Figure 1, we use 3-hour time-out limit. As can be seen, 
Moms heuristics is significantly inferior to Unit and Unirel2 (except for circuitl2). 
On the other hand, Unirel2 provides a clear performance boost over Unit heuristics. 

In the analysis of the results, let us concentrate on three representative heuristics 
from each category: Moms, Unit and Unirel2. Moms is the basic and most popular 
greedy heuristics. Unit is the simplest of the BCP-based heuristics, and Unirel2 is the 
overall fastest of the 6 (Static, Moms, Morel, Unit, Unirel, Unirel2) that we have 
tried. Our results indicate clearly that BCP heuristics perform better than greedy 
heuristics for this domain of problems. BCP heuristics take into account the structure 
of the CNF formula which closely reflects the structure of the original formula 
(before the CNF conversion). Indeed, in the CNF formula there are (possibly long) 
chains of implications. With BCP heuristics, a literal occurring at the top of a chain is 
preferred to a literal occurring in the middle of the same chain. This is not guaranteed 
to be the case with greedy heuristics, where only the number of occurrences counts. 
Moreover, both Unit and Unirel2 feature the failed literal detection mechanism that 
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Moms heuristics is missing. This mechanism allows Unit and Unirel2 to perform 
more simplifications at each node. 

Unirel2 considers only the relevant variables (i.e., the model variables) whereas 
Unit heuristics considers all the variables as a candidate for splitting. Although the 
greedy nature of Unit heuristics makes it more accurate, in most cases the time spent 
to choose a variable will be much more in Unit than Unirel2 (since the number of all 
the variables can be significantly larger than the number of relevant variables). 
Therefore, to give up a bit on quality provides better overall performance for Unirel2. 



3 Forecast - A BDD-Based Symbolic Model Checker 

Several recent papers [bcrz99, bccz99, bccfz99, shtOO] compare traditional BDD - 
based model checking with SAT -based model checking, showing that in many cases 
SAT technology dramatically outperforms BDD technology. Such comparisons 
(except [bccz99]), however, neglect one crucial aspect that distinguishes the two 
approaches. Traditional BDD-based model checking searches for counterexamples of 
unbounded length. In contrast, SAT-based model checking searches for 
counterexamples of a predetermined bounded length. Thus, prior comparison leaves 
open the question whether the difference in performance is due to the underlying 
technology— BDD vs. SAT, or is due to the difference between bounded and 
unbounded model checking. To answer this question, we undertook the task to first 
adapt a BDD-based model checker to bounded model checking and then compare its 
performance to a SAT-based model checker. 




Fig. 1. Comparison of Thunder run-time with the dynamic heuristics Moms, Morel, 
Unit, Unirel and Unirel2 for a benchmark of 26 test cases on a logarithmic scale. In 
the reported runs, time-out has been set to 3 hours. The x axis indicates the test case 
where the y axis indicates the Thunder run-time. We can clearly see that Moms and 
Unit heuristics times out for 6 and 1 out of 26 test cases, respectively. 



3.1 Adapting Forecast for Bounded Model Checking 

Forecast is a BDD-based model checker developed and deployed in Intel, using an 
in-house BDD package. Forecast can run in two modes. In the standard mode. 
Forecast applies either forward or backward breadth-first-search traversal from a 
source set S to a target set T with respect to a transition relation TR, when Image 
refers to a pre-image or post-image operation: 
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Traversal(S,T,R) 

Reach=Frontier=S 
while (Frontier ) { 

if (Frontier T ) terminate; 
Frontier := Image(Frontier,R) - Reach 
Reach := Reach Frontier 
} 



A difficulty often faced hy standard traversal is the excessive growth of the 
Frontier or the Reach set ("state explosion"). To address the former problem, 
Forecast can apply a prioritized-traversal algorithm, see [fkzvfOO]. In prioritized 
traversal mode, we split the Frontier into two balanced parts when its BDD size 
reaches some predetermined threshold. Thus, instead of maintaining one frontier, the 
algorithm maintains several frontiers, organized in a priority queue. A given traversal 
step consists of choosing one frontier set and applying the image operator to that set. 
Thus, prioritized traversal can be viewed as a mixed bread-first/depth-first search. 

How can we adapt standard traversal and prioritized traversal to bounded model 
checking? The first change is to bound the length of the traversal. 



BoundedTraversall(S,T,R,k) 

I :=0 

Reach=Frontier=S 
For (I = 0; I< k; I++) { 

If (Frontier T ) terminate; 
Frontier := Image(Frontier,R)-Reach 
Reach ;= Reach Frontier; 

} 



If the distance between S and T is less than or equal to k, then the running time of 
BoundedTraversall and Traversal would clearly coincide. Note, however, that 
termination is not an issue in bounded traversal. Thus, from a termination point of 
view, there is no need to maintain Reach. 



BoundedTraversal(S,T,R,k) 

Frontier=S 

For (I = 0; I< k; I++) { 

If (Frontier T ) terminate; 
Frontier ;= Image(Frontier,R); 



However, besides guaranteeing termination. Reach was used in the classic 
algorithm to cut down on the size of Frontier. Thus, one would expect 
BoundedTraversal to run into huge Frontiers, resulting in weak performance. This is 
where prioritized traversal comes to the rescue. As before, we split the Frontier 
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whenever its BDD gets larger than some threshold, maintaining a set of frontiers in a 
priority queue. With each frontier we maintain its distance from the source set S. We 
choose frontiers and apply the image operator, making sure that the bound is never 
exceeded. This results in a prioritized version of BoundedTraversal. 

So far we have treated S and T in a symmetrical fashion. In practice, however, the 
initial states are defined in terms of many state variables, while the error state is 
defined in terms of a small number of state variables, called the error variables. 
Cone-of-influence (COI) reduction algorithms take advantage of that by eliminating 
state variables that cannot have any effect on the error variables. In the context of 
BMC, one can be more aggressive and eliminate variables that cannot have an effect 
on the error variables in a bounded number of clock cycles. This optimization called 
Bounded Cone-of-Influence (BCOI) was introduced in [bcrz99]. 

Forecast has a “lazy”mode [ytOO] that effectively applies a BCOI reduction. This 
mode is effective only in backward traversal - for each pre-image, one identifies the 
relevant variables appearing in the frontier and builds a smaller TR based on those 
relevant variables. Naturally, this reduction is more effective when the Frontier has a 
small number of state variables, e.g., when the Frontier is close to the set of error 
states. We have adapted the “lazy model checking” mode of Forecast to BMC (i.e., 
we search for a counter-example for k pre-image steps). 



3.2 Default Configuration for Forecast 

Since we built the benchmark of bounded model checking from internal Intel’s 
benchmark base+ of Forecast, every test case had the best setting for unbounded 
Forecast meaning 

• The right pruning directives to reduce the size of the model 

• The best initial order that the FV expert user could get 

• The best (CPU time-wise) configuration that the FV expert user could get 

The time spent by the FV expert to get to the best initial order and tool 
configuration could not be derived from the benchmark. Furthermore, the 
configuration in the benchmark base was for unbounded Forecast. In order to make a 
fair comparison with Thunder, in search for a best default setting, we experimented 
with three recent state-of-the-art algorithms of Forecast described in Section 3.1 : 
bounded prioritized-traversal, unbounded prioritized traversal [fkzvfOO], and bounded 
lazy model checking. For all the runs a partitioned transition relation was used. 

We present results achieved by Forecast under two different configurations. 

• Automatic : the initial variable order is automatically computed by a static 
variable ordering algorithm 

• Semi-automatic : the initial order is taken from the order that was calculated 
by previous runs of Forecast with dynamic reordering* 



* All the properties verified were safety properties. 

* This evaluation is similar to RB2 configuration in [shtOOa]; however, in our case the order 
gets refined by the dynamic reordering output of more than one mn of the model checker. 
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Both of the configurations were run with dynamic reordering with the threshold 
of 500K BDD nodes (meaning dynamic reordering will be turned on when the total 
number of BDD nodes allocated exceeds 500K). 



TestCase 


Bound 


Forecast 


Forecast 


Forecast 






Lazy 


Prioritized 


Prioritized Unbounded 






(secs) 


(secs) 


(secs) 


Circuit 1 


5 


27.3 


1340 


114.1 


Circuit 2 


7 


1.1 


0.56 


2.1 


Circuit 3 


7 


2.1 


15.00 


106.1 


Circuit 4 


11 


9.1 


2233.00 


6189.0 


Circuit 5 


11 


TIMEOUT 


107800.00 


4196.2 


Circuit 6 


10 


TIMEOUT 


TIMEOUT 


2354.1 


Circuit 7 


20 


4187.2 


TIMEOUT 


2795.1 


Circuit 8 


28 


TIMEOUT 


TIMEOUT 


TIMEOUT 


Circuit 9 


28 


TIMEOUT 


TIMEOUT 


TIMEOUT 


Circuit 10 


8 


TIMEOUT 


TIMEOUT 


2487.1 


Circuit 1 1 


8 


TIMEOUT 


TIMEOUT 


2940.5 


Circuit 12 


10 


TIMEOUT 


TIMEOUT 


5524.1 


Circuit 13 


37 


TIMEOUT 


TIMEOUT 


TIMEOUT 



Table 1. Automatic Setting Comparisons. Forecast performance comparisons for 
different configurations with automatically generated initial order. A time-out limit 
of 3 hours has been set. 

Table 1 and Table 2 summarizes the time spent by Forecast in verifying these test 
cases when a time limit of 3 hours has been set. All experiments were run on FIP 
J6000 work station with 2 Gigabyte memory. Table 1 reports Forecast runs when the 
initial order is automatically generated by the tool and Table 2 reports the results 
when Forecast is given a semi-manual order (i.e. the enhanced order is obtained by 
running Forecast with dynamic ordering several times). 

The bottom line of Table 2 is the criticality of “a good initial order” for good 
performance of a BDD-based model checking. Without a good order, Forecast is far 
from being competitive. Although unbounded prioritized traversal does not 
outperform the other two algorithms for the test cases that all three complete, we 
selected it to be the winner configuration for the automatic default setting, since it 
times out much less than the other two (only three times). Although the success of 
unbounded prioritized traversal versus the bounded version is intriguing, we believe 
it to be due to the better suitability of the initial variable orders to the unbounded 
prioritized traversal. 

Table 2 dilutes the effect of bad initial order; however still no winner 
configuration for all or most of the test cases can be chosen indicating the difficulty 
to set an always winning setting for BDD-based model checkers. Lazy model 
checking in Table 2 for the test cases that it can complete beats the other two. On the 
other hand, it cannot complete 6 verification cases in the time set. No clear winner 
could be found between the bounded and unbounded versions of Prioritized 
Traversal. Although performance of prioritized traversal is worse than lazy model 
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checking for all the test cases where lazy model checking completes, it times out less 
(4 times). 

As can be seen no good (overall winning) default setting could be selected for 
Forecast based on the results of Table 1 and Table 2. We have selected the 
unbounded prioritized traversal as the default setting, since it is a clear winner for 
Table 1 and not performing worse than the others for Table 2; moreover, the setting 
in Table 1 is more fair for comparison with default setting of Thunder, since the 
initial order selection time is included in the overall Thunder run time. Table 2 
numbers do not include the time spent in the generation of the initial order time (i.e, 
the runs of symbolic model checking to generate good orders). However, note that 
although unbounded prioritized traversal is not guaranteed to find the counter- 
example of the minimal length, for all the falsification test cases that we have tried a 
counter-example of length k or less was reported. 



TestCase 


Bound 


Forecast 


Forecast 


Forecast 






Lazy 


Priori tizedBounded 


Prioritized UnBounded 






(secs) 


(secs) 


(secs) 


Circuit 1 


5 


7.4 


21.0 


21.8 


Circuit 2 


7 


1.6 


1.8 


1.9 


Circuit 3 


7 


2.3 


5.2 


5.6 


Circuit 4 


11 


6.7 


89.9 


241 


Circuit 5 


11 


6432.5 


64.2 


80.4 


Circuit 6 


10 


TIMEOUT 


44.6 


36.8 


Circuit 7 


20 


134.3 


7250.2 


TIMEOUT 


Circuit 8 


28 


TIMEOUT 


1421.1 


1287.5 


Circuit 9 


28 


TIMEOUT 


TIMEOUT 


1040.3 


Circuit 10 


8 


147.4 


693.1 


694.6 


Circuit 1 1 


8 


143.9 


260.6 


261.0 


Circuit 12 


10 


2379.2 


4657.0 


1041.5 


Circuit 13 


37 


TIMEOUT 


TIMEOUT 


4188.0 


Circuit 14 


41 


TIMEOUT 


1864.36 


TIMEOUT 


Circuit 15 


12 


423.1 


TIMEOUT 


TIMEOUT 


Circuit 16 


40 


16.1 


783.0 


TIMEOUT 


Circuit 17 


40 


TIMEOUT 


TIMEOUT 


33.1 



Table 2. Semi-automatic Setting Comparions. Forecast performance comparisons for 
different configurations with semi-automatic generated good initial order. 



4 Comparison of Thunder and Forecast 

We evaluated bounded Thunder versus bounded Forecast with respect to 
performance and capacity. For each of these, we studied the aspect of productivity. 

Our performance benchmark consists of 15 real-life falsification test cases. All the 
15 test cases were from the unbounded Forecast benchmark base (meaning all the test 
cases could be falsified at special settings of Forecast). Since unbounded version of 
Forecast finds counterexamples of minimal length, we knew beforehand the minimal 
length k for the counterexamples that can be generated for each test case. Therefore, 
we could generate for each test case a bounded k-1 verification version. Furthermore, 
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we added to our benchmark two hard verification cases where we requested both 
Forecast and Thunder to verify that no counter-example exists. In this manner, we 
evaluated the power of bounded Thunder versus the power of bounded Forecast for 
both verification and falsification test cases. 

We built the capacity benchmark (made up of 11 test cases) by eliminating the 
pruning directives of some of the test cases in the performance benchmark and we 
added brand-new test cases clearly surpassing the capacity limits of Forecast and 
other state-of-the-art model checkers (i.e., verification test case with over 2000 
sequential elements and inputs). 

4.1 Analysis of Performance Benchmark Results 

Table 3 compares the performance of default Thunder setting with default and tuned 
settings of Forecast. The default setting of Forecast (prioritized traversal + dynamic 
reordering + partitioned transition relation -i- automatic initial ordering) is far from 
being competitive. For tuned Forecast results, we report the configuration that has 
worked best. All the tuned configurations, except the ones explicitly reported do not 
activate dynamic reordering. As can be seen they include variations of transition 
relation (tr part (partitioned), tr mono (monolithic)), variations of priorities (min size, 
max states, BFS, DFS) for prioritized search and variations of configurations for lazy 
model checking. 

The comparison of default settings of Thunder and Forecast reveals that Forecast’s 
default performance and capacity is far below Thunder’s. On the other hand, the 
comparison results reveal that Thunder at default setting provides compatible 
performance to tuned Forecast results. For 6 benchmarks out of 17, Thunder default 
settings beat tuned Forecast setting’s results by 2 to 3X (See in Table 3 the 
comparison on Circuit 3, 5, 8, 9, 10, and 11). For Circuit 13, Thunder default 
performance wins over Forecast tuned performance by 9X. Nevertheless, tuned 
Forecast results are 2 to 3 X better for Circuit 7 and Circuit 12. Thus, there is no clear 
winner with respect to performance when default Thunder and tuned Forecast’s 
performance is compared. The only conclusion is that Thunder gives a significant 
productivity boost. In short, unlike Forecast Thunder does not require high tuning 
effort to perform well. 

Through the performance benchmark, we also tested the capacity of Thunder 
versus Forecast. Three test cases that could be easily verified by tuned Forecast 
setting could not be verified by any heuristics of Thunder (Circuit 14 (bound 40, 41), 
Circuit 16 (bound 40), Circuit 17 (bound 60)). Although Thunder could not solve 
(except for Circuit 16) the bounded model checking problem for these test cases, it 
could solve a variation of the problem (exact, exact-assume described in Section 2.1). 
As seen in Figure 2, although exact and exact-assume modes are significantly faster 
than the bound mode, the problem solved is simpler. By exact-assume, we are 
verifying the existence of a counterexample of exactly length k. Clearly, the solution 
of k exact-assume verification cases where the existence of a counter-example of 
length 1 to length k are verified will be equivalent to verifying bound k problem. 
Although too time consuming, the fact that Thunder could solve the exact-assume 
problem for most of the hard test cases for the bound version, indicates that the 
solution of these problems is in the capacity range of Thunder. 
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4.2 Analysis of Capacity Benchmark Results 

We generated the capacity benchmark by eliminating the pruning directives set to get 
the model checking cases through. The size of the test cases in the capacity 
benchmark containing thousands of sequential elements and inputs is clearly far 
beyond the capacity of Forecast and any other state-of-the art BDD-based symbolic 
model checker. Therefore, no results are reported for Forecast. Thunder has 
successfully verified a wide range of the test cases in the capacity benchmark 
indicating a clear win over Forecast for un-pruned test cases. The fact that Thunder 
could verify these test cases without the extensive pruning effort required for BDD- 
based model checker is also a clear indication of productivity gain achieved by 
Thunder. 

In Table 4, we report the CPU time of the overall run of Thunder for 1 1 test cases. 
The test cases, circuit 1, 3 and 4, are the same test cases that have been used for the 
performance evaluation. For this benchmark, the pruning directives set by the user to 
get the verification fit the capacity of BDD-hased model checking have been 
eliminated. We report the number of latches and inputs before and after the 
application of automatic pruning operation (cone-of-influence reduction with respect 
to property). As can be seen, using Thunder, test cases with over 9000 latches and 
inputs could be verified without requiring any additional manual pruning effort. In 
Table 4, the bounded model checking problem fed into Thunder SAT engine 
represents a verification case (i.e., NcircuitS) of total 6832 inputs and sequential 
elements representing 121786 SAT variables and 358334 clauses. These results, 
although in the domain of bounded model checking, are a clear indication of the 
promise in this technology to establish model checking as a robust and popular 
technique at industrial validation environments. 



Fig. 2. Performance comparison results of bound and exact-assume modes of 
Thunder for the same k. The x axis represents the test cases when the y axis 
represents Thunder run-time in seconds. 
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TestCase 


Bound 


Variables, 
Clauses in 
Thunder 


Thunder 
Default (secs) 


Bounded 

Forecast 

Default 

(secs) 


Bounded 

Forecast 

Tuned(secs), 

Configuration 


Circuit 1 


5 


5055, 14690 


2.43 


114 


2.80, lazy 


4 


3987, 11559 


1.59 




Circuit 2 


7 


2000, 5727 


0.81 


2 


0.56, lazy 


6 


1688, 4820 


0.64 




Circuit 3 


7 


3419, 8977 


2.01 


106 


1.29, lazy 


6 


2908, 7623 


1.17 




Circuit 4 


11 


6740, 18884 


1.91 


6189 


1 .04, lazy 


10 


6085, 17030 


1.53 




Circuit 5 


11 


10258, 29515 


10.12 


4196 


35.14, 

unbounded-prio, 
tr part 


10 


9303, 26746 


8.78 




Circuit 6 


10 


8829, 25587 


5.51 


2354 


8.34, 

unbounded-prio, 
tr part 


9 


7918, 22927 


4.85 




Circuit 7 


20 


28769, 85033 


236.29 


2795 


76.88, 

unbounded-prio: 
minimum size 


19 


27316, 80732 


140.65 




Circuit 8 


28 


38836, 116803 


45.66 


TIMEOUT 


141.00, 

unbounded-prio: 
BFS, tr mono 


27 


37427, 112558 


52.85 




Circuit 9 


28 


37451, 112465 


39.96 


TIMEOUT 


85.50, 

unbounded-prio : 
max states 


27 


36092, 108377 


50.65 




Circuit 10 


8 


8734, 25631 


5.01 


2487 


13.90, lazy 


7 


7517, 22031 


5.79 




Circuit 1 1 


8 


8734, 25631 


5.01 


2940 


13.89, lazy 


7 


7517, 22031 


5.76 




Circuit 12 


10 


8331, 24497 


378.05 


5524 


159.20, 

unbounded-prio, 
tr part 


9 


7429, 21826 


139.47 




Circuit 13 


37 


60779, 169824 


195.15 


TIMEOUT 


1586.00, 

unbounded-prio 


36 


59118, 165175 


217.75 




Circuit 14 


41 


51917, 154061 


TIMEOUT 
91.9 (exact- 
assume) 


TIMEOUT 


833.96, 

unbounded-prio 

maxstates 


40 


50616, 150220 


TIMEOUT 

83.88(exact- 

assume) 




Circuit 15 


12 


9894,29138 


1070.65 


TIMEOUT 


17.31, 

unbounded-prio 


11 




4209.1 




Circuit 16 


40 


40718,114344 


TIMEOUT 

(exact-assume) 


TIMEOUT 


16.1, lazy + 
reorder 


20 


20000, 56009 


22.03 




Circuit 17 


60 


123323,356126 


TIMEOUT 

4652.76 

(exact-assume) 


TIMEOUT 


3657.3, tr part, 
forward reach 


20 


41968,120996 


247.27 





Table 3. Performance comparison results of default Thunder versus default and tuned 
Forecast. For Forecast, no timing for bound k-1 is reported (clearly it is less than the 
time reported for bound k). 






Benefits of Bounded Model Checking at an Industrial Setting 451 



Unpruned 
Test Cases 


Bound 


Num. Latches + 
Inputs before 
Automatic 
Pmning 


Num. Latches 
+ Inputs after 
Automatic 
Pmning 


Variables, 

Clauses 


Thunder 
time (secs) 


Circuit 1 


5 


12011 


152 


6831, 19759 


6.1 




4 


12011 


152 


5403, 15591 


5.1 


Circuit 3 


7 


7054 


0.81 


24487, 65332 


96.1 




6 


7054 


0.64 


200552, 54774 


16.37 


Circuit 4 


11 


6586 


2.01 


119248,353400 


78.61 




10 


6586 


1.17 


107838,319404 


68.2 


Ncircuit 6 


5 


9704 


1.91 


21351,61499 


29.39 


Ncircuit 7 


5 


17262 


1.53 


TIMEOUT 


TIMEOUT 


Ncircuit 8 


6 


6832 


10.12 


121786, 358334 


576.24 


Ncircuit 9 


11 


3321 


8.78 


35752, 105268 


73.32 


Ncircuit 10 


6 


1457 


5.51 


50578, 149668 


267.91 



Table 4. Results from the Capacity Benchmark. 

5 Conclusions 

In this paper, we have reported our effort to develop industrial strength BMC and the 
impressive productivity gain achieved by using SAT-based BMC (Thunder) versus 
BDD-based BMC (Forecast). This gain is achieved by drastic reduction in the 
required user ingenuity and tuning effort in running the tools. Our work agrees with 
previous work [bccz99, bcrz99, shtOO] in the observation that SAT-based BMC can 
outperform BDD-based BMC. We show that this statement holds mainly in 
comparison of SAT-based BMC with untuned BDD-based BMC supporting our 
conclusion on productivity boost of SAT. Moreover, the evaluation of SAT-based 
BMC on verification test cases of over thousands of inputs and sequential elements 
reveals its outstanding capacity to verify designs far beyond the capacity ballpark of 
the state-of-the-art BDD-based model checkers. 

The tuning effort that we have invested to get best default setting for SAT-based 
BMC introduces a new dynamic heuristics, Unirel2, which is a winner for Intel’s 
bounded model checking benchmark supporting the statement made on the 
productivity gain achieved by Thunder over Forecast. 
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Abstract. We describe the techniques we have used to search for bugs in 
the memory subsystem of a next-generation Alpha microprocessor. Our 
approach is based on two model checking methods that use satisfiability 
(SAT) solvers rather than binary decision diagrams (BDDs). 

We show that the first method, bounded model checking, can reduce the 
verification runtime from days to minutes on real, deep, microprocessor 
bugs when compared to a state-of-the-art BDD-based model checker. 
We also present experimental results showing that the second method, a 
version of symbolic trajectory evaluation that uses SAT-solvers instead 
of BDDs, can find as deep bugs, with even shorter runtimes. The tradeoff 
is that we have to spend more time writing specifications. 

Finally, we present our experiences with the two SAT-solvers that we 
have used, and give guidelines for applying a combination of bounded 
model checking and symbolic trajectory evaluation to industrial strength 
verification. 

The bugs we have found are significantly more complex than those pre- 
viously found with methods based on SAT-solvers. 



1 Introduction 

Getting microprocessors right is a hard problem, with harsh punishments for 
failure. With current design methods, hundreds to thousands of bugs must be 
found and removed during the design of a new processor, and there are heavy 
economic incentives to get most of them out before first silicon. 

Current designs are so complex that simulation-based methods are no longer 
adequate. Most companies in the industry, including at least AMD, Compaq, HP, 
IBM, Intel, Motorola, and Sun, have therefore investigated formal verification. 
Their choices of methods, tools, and application areas have varied, as has their 
level of success. 

One of the areas we have concentrated on at Compaq is property verification 
for our microprocessor designs. Among other things, we have investigated the 
use of symbolic model checking Pj to find Register Transfer Level (RTL) bugs in 
a next-generation Alpha processor. Our goal in this work has been to find bugs, 
rather than to prove their absence, since there are many bugs to find in a design 
under development. 
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Our initial experiments with symbolic model checking convinced us that the 
capacity limits of many model checkers prevent us from finding bugs cost ef- 
fectively. The best model checker we could find, an experimental version of 
Cadence SMV needs several hours to days to check simple properties of 
heavily reduced components. As a consequence, we have also looked at model 
checking using satisfiability (SAT) solvers jdl^ll tij . These methods have shown 
real promise, especially for finding bugs, when compared to BDD-based model 
checkers like SMV. 

In this paper, we describe how we have applied two SAT-based verification 
techniques to find real bugs in the memory subsystem of the Alpha chip. The 
first technique, bounded model checking (BMC) 0, has previously been applied 
to industrial verification, but not for finding bugs of length anywhere near what 
we will describe. The second of these techniques, symbolic trajectory evaluation 
(STE) has previously not been used together with SAT-solvers at all. 

We compare the performance of SAT-based bounded model checking to state- 
of-the-art BDD-based model checking, and present results showing the useful- 
ness of SAT-based STE. Our experiences are very positive: the use of SAT- 
based methods has reduced the time for finding certain bugs from days to a few 
minutes. We also compare the performance, when finding bugs in real designs, 
of the two SAT-solvers we have used: GRASP HSI, and Prover Technology’s 
Prover m proof engine. Finally, we present guidelines for applying a combi- 
nation of BMC and SAT-based STE to microprocessor bug finding. 



Related Work. Bounded model ehecking 0 (BMC) was invented by Biere and 
coworkers as a method for using SAT-solvers to do model checking. BMC has 
previously been applied to bug finding for Power PC chips 0. To our knowledge, 
BMC is the only SAT-based model checking method that has been used in 
realistic microprocessor verification. 

In the Power PC verification, the authors did not model the environment of 
the designs under analysis. BMC quickly found short counterexamples to the 
properties being verified, but they were false failures due to illegal input se- 
quences. BMC did well at this compared to BDD-based model checking, but the 
results said little about whether BMC could find real bugs, which are generally 
much deeper. We, on the other hand, present the results of searching for, and 
finding, real, deep bugs. One of our important contributions is therefore that we 
demonstrate that BMC together with cutting edge SAT-solvers has the capacity 
to find realistic bugs in industrial designs. 

Symbolic trajectory evaluation (STE) is a model checking method invented 
by Seger and Bryant H2| that consists of an interesting mix of abstract inter- 
pretation and symbolic evaluation. STE is in industrial use, primarily for data 
path and memory verification, at companies including Intel Q and Motorola. 
Up to now, STE has always been implemented using BDDs; the use of SAT- 
solvers to do STE has not been reported previously in the literature. Moreover, 
we apply symbolic trajectory evaluation to verification at the synchronous gate 
level — a fairly high level of abstraction for STE, which has previously been used 
predominantly at the transistor level. 
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There are other ways of doing SAT-based model checking than the ones 
that we discuss in this paper. We refer readers interested in these alternative 
approaches to j2|liJZ05|. 

The paper is organised as follows. In Sections Eland E] we give brief introductions 
to BMC and STE. We then describe the component that we have focused on, the 
merge buffer, and the process we have used to analyse it. After that, we go on to 
describe the actual use of the verification tools and the results. Finally, we give 
guidelines for using a combination of BMC and STE for heavy-duty industrial 
verification. 

2 Preliminaries 

In this paper, we will search for counterexamples to properties of synchronous 
gate- level hardware. Such circuits can be viewed as finite transition systems, 
where the states are value assignments to a vector s = {s.O, . . . , s.n) of boolean 
variables called the system’s state variables ^ . The transition system for a given 
circuit can be represented as two propositional logic formulas | 2 |: 

Init{s) Initial states formula 

Transits, s') Transition relation formula 

The first formula, Init, is a formula that characterises the initial states by 
evaluating to true exactly for the assignments to the state variables that are 
initial states. The second formula, Trans, evaluates to true for s and s' precisely 
when there is a transition from the state assigned to s to the state assigned to 
s'. 

Our analyses take as inputs the formulas Init and Trans together with a 
description of a property to check. Such a property might for example be “a 
store instruction to an 10 address is never discarded.” The aim of the analyses 
is then to generate a trace, if one exists, where an 10 store is thrown away. 

In the case of BMC, we will specifically focus on detecting failures of safety 
properties. Informally, safety properties are properties of the form “in every 
reachable state of the system, the property P holds.” 

3 Bounded Model Checking 

Bounded model checking tries to find bugs in a system by constructing a formula 
that is satisfiable precisely if there exists a length N or shorter trace violating a 
property given by the user. The BMC procedure feeds this formula to an external 
SAT-solver, and uses the returned assignment (if any) to extract a failure trace. 

The bound N is given by the user, and will affect both the size of the gener- 
ated formulas, and the length of the failure trace that can be detected. A negative 
answer from the SAT-solver for a given N does not mean that the whole system 
is safe, only that there are no failure traces of length N or shorter. BMC is thus 
used to find bugs, rather than to prove their absence. 
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We assume that the safety property we are interested in has been encoded 
as a propositional logic formula Prop{s) that will evaluate to true exactly for 
the states fulfilling the property. Given the bound N, and the formulas Init{s), 
Transits, s'), and Prop{s), the BMC procedure constructs the following formula, 
which characterises failure traces of length N or shorter: 

Init(si) A 

Trans{s\, S 2 ) A . . . A Trans{sN-i, sn) A 
Prop{si) V ... V ^ Prop{sN)) 

If the SAT-solver returns an assignment to the state variables in s\ . . .sn that 
makes this formula true, then there exists an initial state si in the system, from 
which we can reach another state Sfe (A: G {1 . . -N}) where the property fails. 
The BMC procedure can thus extract a failure trace from the assignment. 

4 Symbolic Trajectory Evaluation 

A symbolic trajectory evaluator takes Trans{s, s') as input together with a so 
called trajectory assertion of the form Ant ^ Cons. The antecedent and con- 
sequent of the trajectory assertion, Ant and Cons, are lists of equal length, in 
each of which the ith entry says something about the system’s state variables at 
time i. Informally, a trajectory assertion will be true with respect to a system 
if a trace of the system that agrees with the antecedent necessarily must agree 
with the consequent. The objective of symbolic trajectory evaluation is to gen- 
erate a failure trace for the system that satisfies the antecedent, and violates the 
consequent. 

As an example, assume that we have constructed a circuit whose state vari- 
ables s.a and s.b should contain the or and the and, respectively, of the current 
and previous value of the state variable s.i. The following trajectory assertion 
specifies this property: 



[node s.i is x, node s.i is y] 



[(•), node s.a is a; V y and node s.b is a; A y] 

Here (•) means “no requirements on the state variables”, so the assertion can be 
read, “if we have a trace of the system where s.i contains the value x at some 
time t, and s.i contains the value y at time t -I- 1, then at time t -I- 1 s.a and s.b 
contains the logical or and the logical and of x and y, respectively.” 

In order to generate a failure trace, the trajectory evaluator first computes a 
boolean expression ok over the user-introduced variables x and y. This expression 
has the property that it evaluates to true for the assignments to x and y for 
which the antecedent guarantees the consequent (and no others). A key element 
of symbolic trajectory evaluation is that ok is constructed by symbolic reasoning 
in a four-valued logic. In addition to the two standard values True and False, 
the four- valued logic contains the values X (unknown), and T (overspecified). 
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The value X is used to model unknown contents of state variables, and the value 
T is used to model the contents of state variables that are required to contain 
two different values at the same time. 

When ok has been computed, the evaluator uses an external SAT-solver to 
check whether there exists an assignment to x and y that makes ok evaluate 
to false. If there exists such an assignment, there is a trace of the circuit that 
is consistent with the antecedent but violates the consequent. The trajectory 
evaluator then instantiates x and y with the falsifying values, and constructs a 
failure trace that is given back to the user. 

5 The Merge Buffer 

Alpha processors, like most state-of-the-art microprocessors, have a very hierar- 
chical structure. A processor is divided into a handful of so called boxes, each 
responsible for dealing with a particular aspect of instruction execution. For 
example, the IBox handles instruction fetch, and the MBox executes memory- 
reference instructions. Each box is further divided into a handful of parts that 
we will call subboxes. 

The subbox that is the focus of our attention in this paper is the merge buffer, 
an important component of the MBox for a next-generation Alpha chip. We chose 
the merge buffer as it is one of the most complex subboxes in the processor. Our 
hope is that if we can cost-effectively find bugs in this component, then we can 
use the same methods on most other subboxes. 

The function of the merge buffer is to receive requests to write into memory, 
and to reduce the traffic on the memory bus by merging stores to the same physi- 
cal address. In order to do the merging correctly, the merge buffer communicates 
with four other subboxes: (1) the store queue, where store instructions are saved 
until they are written out of the merge buffer; (2) the load queue, where load 
instructions are stored until they have received results from memory; (3) the 
CBox, which deals with the cache coherence protocol; and (4) the backend tag 
module. 

The merge buffer is essentially a large buffer with a very complex policy for 
reading in entries, merging stores, and writing out stores to the memory. It has 
about 14 400 latches, 400 primary inputs, and 15 pipeline stages. The pipeline 
has complex feedback that prevents us from retiming away latches. 

6 Analysis Cycle 

In Figure G] we show the analysis cycle that we have used to locate bugs in the 
merge buffer. 

We start off with the original RTL description of the circuit. As the full-size 
merge buffer contains more than ten thousand latches — too much state to be 
feasible to verify using standard model checking technology — we need to reduce 
the size of the model. The idea is to remove portions of the state in the circuit 
in ways that do not alter the circuit behaviour with respect to the properties of 
interest. The most important reductions are symmetry reductions 0 , which we 
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false failure 



Fig. 1. Our Verification Flow. 



use to reduce the number of buffer entries, address bits per entry, data bytes per 
entry, and bits per data byte. 

We do not mind if some of our reductions do not preserve all possible proper- 
ties of the circuit, as long as we can find problems in the reduced circuit that also 
are present in the full size circuit. The reason for this is that we are interested 
in finding bugs, as opposed to proving correctness. We are thus permitted to 
do ad-hoc reductions that are formally incorrect, but that preserve most of the 
interesting behaviour of the circuit. 

After the reductions, the merge buffer has about 40 primary inputs. When 
the merge buffer is in use, these inputs will be connected to the four subboxes 
with which the merge buffer communicates. If we leave them unrestricted, the 
verification will be done under the assumption that any inputs can occur at any 
time. However, in order to function correctly, the merge buffer relies on assump- 
tions about the behaviour of its environment. We therefore have to restrict the 
input to the merge buffer by adding transactor state machines that provide a 
verification environment that rules out input behaviours that could not arise in 
real use. 

We then abstract the resulting circuit in two ways. First, we use an RTF 
compiler to optimise the circuit by performing transformations like constant 
propagation and common subexpression elimination. The reduced merge buffer 
now has about 1800 latches and 10 free primary inputs. We then do a final ab- 
straction step that removes redundant latches, and replaces groups of transparent 
latches with standard flip-flops (a single transparent latch can not be modelled 
synchronously, but we can often model clusters of transparent latches). The final 
model has about 600 state nodes in the cone of most properties. 

The end result of the reductions and abstractions is the model that we give 
to the verification tools. However, before we can do that, we need to write down 
the property of interest in a format that the tool we want to use accepts. Given 
the model and the property, the verification tool then either produces a failure 
trace, or tells us that the property is true (which has little meaning as we have 
performed ad-hoc reductions). 

A lot of design knowledge is needed to decipher a failure trace; a property can 
fail for more than one reason. First of all, we might have made a specification 
mistake that causes the tool to diagnose an intended behaviour of the system as 
a failure. In this case we need to modify the property. Second, the trace might be 
a trace that the real system could not exhibit, because it has arisen due to the 
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merge buffer’s environment providing input signals that cannot occur in real-life. 
In this case we need to go back and modify the transactors so that we disallow 
this behaviour, and re-abstract the resulting model. Third, we might have found 
a real bug. 



7 Verification 

In this section, we describe our experiences of applying BDD-based symbolic 
model checking, BMC, and STE to the merge buffer. The areas of the merge 
buffer that we target have previously been well explored with simulation-based 
verification. 



7.1 BDD-Based Symbolic Model Checking 

SMV was the first BDD-based tool that we evaluated that showed some promise 
for checking non-trivial merge buffer properties. (We have evaluated several.) 
However, most of the interesting merge-buffer properties contain about 600 
latches in the cone of influence, and BDD-based model checking of state ma- 
chines containing more than a couple of hundred latches is highly non-trivial. 
In order to find bugs using SMV, we therefore have to decrease the size of the 
cone by setting a subset of the 10 free primary inputs to specific values during 
the run. These values restrict the part of the state space that we explore using 
the model checker. 

In order to get better performance out of SMV, we have ported it to the 
64-bit Alpha architecture. This allows us the benefits of performing the model 
checking runs on a high performance server with 8 GB of main memory. To 
further improve SMVs capacity, we have also augmented the standard variable 
reordering heuristics with two special purpose tactics. 

In spite of the improvements to SMV, each property still takes several hours 
to explore on the server. We have found many bugs this way, but it is slow. 

7.2 Bounded Model Checking 

The first alternative to BDD-based model checking that we have investigated 
is bounded model checking, as implemented in the SAT-based model checking 
workbench FixIt |2|. 

One of the SAT-solvers that we wanted to use together with FixIt, ProVER 
M, was not available for the Alpha architecture when this work was done. We 
have therefore done all of our BMC runs on a 32-bit PC. The performance of 
the BMC analysis is still remarkable. Even though we are not using a high- 
performance processor with many gigabytes of memory, we can find failures in 
a fraction of the time needed by SMV. In Tabled we compare the runtimes of 
BMC, running on a 450 MHz 32-bit PC, to SMV, running on a 700 MHz 64-bit 
Alpha. 

The first column of BMC runtimes is obtained using CAPTAIN PROVE, a 
command-line tool from Prover Technology. CAPTAIN PROVE uses ProVER’s 
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Table 1. Comparison between Bounded Model Checking and SMV. 



Failure length 


SMV 

sec 


Captain Prove^i^q 

sec 


GRASP BMC 

sec 


25 


62 280 


85 


25 


26 


32 940 


19 


19 


34 


11 290 


586 


272 


38 


18 600 


39 


101 


53 


54 360 


1 995 


[>10000 s] 


56 


44 640 


2 337 


[>10000 s[ 


76 


27 130 


619 


6 150 


144 


44 550 


10 820 


[>10000 s[ 



application programming interface El to search for models using strategies. A 
simple such strategy, which we will refer to as the timed strategy, looks as follows: 

sat 1 time 3600. 

back level 5 [ sat 1 time 30. ] . 

The timed strategy first does a preprocessing step called 1 -saturation |l 4] for 
3600 seconds. This analysis tries to find information restricting the search space 
we have to traverse for a model. The 1-saturation is then followed by the actual 
search, baektraeking. At every fifth level of the search tree, the SAT-solver is 
instructed to do 30 seconds of additional 1-saturation. 

The use of strategies allows us to control the search for assignments. We use 
different choices of strategies for different bounds N . When N is less than 40, we 
use the default strategy of 1-saturation without a time limit followed by normal 
backtracking. For N larger than 40, we use the timed strategy with different 
values for the initial 1-saturation. For example, for length 60 traces we normally 
need 1000 seconds of initial saturation, whereas for traces over 100 cycles long 
we use 10 000 or 20 000 seconds of initial saturation. 

As can be seen from TableG] BMC using CAPTAIN PROVE detects the failures 
significantly faster than SMV. In some cases it reduces the runtime for finding a 
bug from a day to a couple of minutes. The lengths of failures that are detected 
range from 25 cycles up to well over a hundred cycles. 

The second column of BMC runtimes is obtained using GRASP [E|, a high- 
capacity public domain SAT-solver. As can be seen in the table, CAPTAIN PROVE 
and GRASP both work well for short failures. For longer failures, CAPTAIN 
Prove outperforms GRASP. (Please note that the reason for the [>ioooo s] table 
entries is that GRASP automatically terminates after 10 000 seconds; we have 
not cut it off.) 

7.3 SAT-Based Symbolic Trajectory Evaluation 

The second alternative to BDD-based model checking that we have investigated 
is a SAT-based version of symbolic trajectory evaluation that we have imple- 
mented in FixIt. 
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The advantage of using STE instead of BMC is that we are not forced to give 
symbolic values to each time-instance of a state variable. Instead we can choose 
to give concrete values to some state variables, or leave them to contain X. This 
potentially permits us to do much deeper exploration of the state-space than we 
can do using BMC, while preserving the short run times. 

However, in order to take full advantage of this increased flexibility, we have 
to spend more time coming up with a good specification that judiciously gives 
concrete and symbolic values to the right variables. 

For example, if we do not give concrete or symbolic values to some of the state 
variables, they are initialised to contain the unknown value X . This value often 
propagates, since it may be impossible to draw conclusions about the outputs 
of a gate with an unknown input. We might also have forgotten to assign a 
value to a primary input at an important time. When a property fails because 
of such underspecification, we have to make the specification more detailed by 
introducing symbolic or concrete values. A given STE specification will thus 
often have to go through several iterations of revision. 



Table 2. Runtimes for detecting failures using symbolic trajectory evaluation. 



Failure length 


Captain Prove 
sec 


GRASP 

sec 


77 


7.7 


33.3 


77 


7.7 


34.2 


112 


10.8 


51.9 


123 


11.7 


51.9 



In Table ^ we present the runtimes needed to find four bugs in the merge 
buffer using STE. The times to do the actual detections are short, but we had to 
spend a lot of time developing the specifications. Luckily, the turnaround time 
for discovering that an assertion is underspecified is a few seconds at most, which 
means that the specification work is very interactive. 

The table shows a clear difference between the performance of STE using 
GRASP and CAPTAIN PROVE. However, the actual runtimes are very low in 
both cases. For the purpose of using SAT-based STE to locate bugs in the merge 
buffer, we can clearly make do with a public domain SAT solver. 



8 A Proposal for a Methodology 

From the previous section, it is clear that BDD-based model checking, BMC, and 
STE have very different characteristics. Based on the experiences we have had 
while locating design errors in the merge buffer, we have the following suggestion 
for a methodology: 
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— Start the analysis of a new subbox with bounded model checking. 

— Initially test a new property with a small bound, so that the check only 
takes a few seconds. This will catch low-hanging fruit, and alert us to simple 
problems with inputs that are not properly constrained. 

— Remove false counterexamples by modifying the transactors or the property, 
as appropriate. 

— Start looking for long failures of the property. Choose a small set of bounds, 
ranging from medium long up to very challenging, and check each of them 
using the timed CAPTAIN PROVE strategy. Use longer and longer saturation 
times. 

— Use STE to quickly check that the problem is fixed whenever the designers 
have corrected a bug found using BMC. Also abstract the failure trace by 
making some of the inputs or control signals symbolic. This allows quick 
checking for failures that are similar to the original failure. 

— When the BMC checks start taking more than half an hour or so, start 
working in parallel on using STE to find the bug. 

— If neither BMC nor STE seems to find any failures, try SMV or move on to 
another property. 

9 Conclusions 

In this paper, we have presented the techniques that we have used to find bugs 
in a crucial component of a microprocessor in design. Our approach is based 
on bounded model checking and a SAT-based version of symbolic trajectory 
evaluation that we have developed. 

Our experimental results demonstrate that it is possible for BMC to out- 
perform state-of-the-art BDD-based symbolic model checking by two orders of 
magnitude, even when we look for bugs in deeply pipelined industrial compo- 
nents. None of the bugs described here has been a false counterexample. As a 
result, their complexity in terms of the length of minimum failure traces has been 
significantly larger than previously have been found using SAT-based techniques. 

We have had less time to evaluate the use of SAT-based STE, but it seems 
clear that it is a very attractive bug-finding method. We have used STE to find 
bugs as deep as the ones we have been able to find using BMC, with negligible 
runtimes. However, this does not come for free; we have decreased the tool’s 
runtime by spending more time developing specifications. 

We have also presented a comparison of the performance of CAPTAIN PROVE 
and GRASP for BMC and STE, and suggested a methodology for SAT-based 
industrial bug finding. 

We believe that the approach we have presented here can be cost effective, 
and that the techniques we have used will become vital instruments in the stan- 
dard verification toolbox. During the two months when the work that is pre- 
sented in this paper was done, we improved the SAT-based framework PlXlT 
significantly and removed many bottlenecks that we had not encountered on 
academic examples. The dramatic decrease in runtimes that we achieved in this 
short time makes us believe that there is a large potential for further improve- 
ment. 




464 



Per Bjesse, Tim Leonard, and Abdel Mokkedem 



Acknowledgements 

Many thanks to Gunnar Andersson, Luis Baptista, Arne Boralv, and Joao Mar- 
ques Silva, who gave advice on running the SAT-solvers. We would also like to 
thank Gabriel Bischoff, John Matthews and Mary Sheeran for their useful com- 
ments on earlier drafts of this paper. Finally, Per Bjesse thanks Gompaq’s Alpha 
Development group for hosting him during the autumn of 2000. 



References 

1. M. Aagaard, R.B. Jones, T.F. Melham, J.W. O’Leary, and C.-J. H. Seger. A 
methodology for large-scale hardware verihcation. In Formal Methods in Computer 
Aided Design, November 2000. 

2. P. A. Abdulla, P. Bjesse, and N. Een. Symbolic reachability analysis based on 
SAT-solvers. In Proe. TAG AS ’00, 9*^ Int. Conf. on Tools and Algorithms for the 
Construction and Analysis of Systems, 2000. 

3. A. Biere, A. Cimatti, E.M. Clarke, and Y. Zhu. Symbolic model checking with- 
out BDDs. In Proc. TACAS ’99, 8*^ Int. Conf. on Tools and Algorithms for the 
Construction and Analysis of Systems, 1999. 

4. A. Biere, E.M. Clarke, R. Raimi, and Y. Zhu. Verifying safety properties of a 
PowerPC [tm] microprocessor using symbolic model checking without BDDs. In 
Proc. Int. Conf. on Computer Aided Verification, 1999. 

5. P. Bjesse and K. Claessen. SAT-based verification without state space traversal. 
In Formal Methods in Computer Aided Design, November 2000. 

6. E.M. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, December 
1999. 

7. A. Gupta, Z. Yang, and P. Ashar. SAT-based image computation with application 
in reachability analysis for verification. In Formal Methods in Computer Aided 
Design, November 2000. 

8. C.N. Ip and D. Dill. Better verification through symmetry. Formal Methods in 
System Design, 9(1/2) :41-75, August 1996. 

9. K.L. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993. 

10. K.L. McMillan. The SMV language. Technical report. Cadence Berkeley Labs, 

1999. 

11. Prover Technology AB. Proven 4-0 Application Programming Reference Manual, 

2000. PPI-Ol-ARM-1. 

12. C.-J. H. Seger and R.E. Bryant. Formal verification by symbolic evaluation of 
partially ordered trajectories. Formal Methods in System Design, 6(2):147-190, 
March 1995. 

13. M. Sheeran, S. Singh, and G. Stalmarck. Checking safety properties using induction 
and a SAT-solver. In Formal Methods in Computer Aided Design, November 2000. 

14. M. Sheeran and G. Stalmarck. A tutorial on Stalmarck’s proof procedure for 
propositional logic. Formal Methods in System Design, 16(l):23-58, January 2000. 

15. J.P.M. Silva. Search algorithms for satisfiability problems in combinational switch- 
ing circuits. PhD thesis, EECS Department, University of Michigan, May 1995. 

16. P.F. Williams, A. Biere, E.M. Clarke, and A. Gupta. Combining decision diagrams 
and SAT procedures for efficient symbolic model checking. In Proc. 12*^ Int. Conf. 
on Computer Aided Verification, 2000. 




Towards Efficient Verification of 
Arithmetic Aigorithms over Gaiois Fieids GF(2’”) 

Sumio Morioka, Yasunao Katayama, and Toshiyuki Yamane 
IBM Tokyo Research Laboratory 

1623-14 Shimotsuruma, Yamato-shi, Kanagawa-ken 242-8502, Japan 
eO27160jp. ibm . com 
Tel: +81-46-215-5736 Fax: +81-46-273-7413 

Abstract. The Galois field GF{2"') is an important number system that is 
widely used in applications such as error correction codes (ECC), and 
complicated combinations of arithmetic operations are performed in those 
applications. However, few practical formal methods for algorithm 
verification at the word-level have ever been developed. We have defined 
a logic system, GF 2 - -arithmetic, that can treat non-linear and non-convex 
constraints, for describing specifications and implementations of arithmetic 
algorithms over GF{2"') . We have investigated various decision 
techniques for the GF 2 - -arithmetic and its subclasses, and have performed 
an automatic correctness proof of a («, « 4) Reed-Solomon ECC decoding 
algorithm. Because the correctness criterion is in an efficient subclass of 
the GF 2 - -arithmetic (k -field-size independent), the proof is completed in 
significantly reduced time, less than one second for any nr > 3 and n > 5 , 
by using a combination of polynomial division and variable elimination 
over GF{2"') ^ without using any costly techniques such as factoring or a 
decision over GF(2) that can easily increase the verification time to more 
than a day. 

1 Introduction 

Due to the exponential growth of scale and speed of networks and computer systems, 
the importance and use of error correction codes (ECC) and crypto systems have been 
increasing rapidly. In the majority of these algorithms, the Galois field GF(2"‘) [1], or 
finite field, is used as a number system. 

Because complicated combinations of arithmetic operations are performed in these 
algorithms, the necessity of applying formal verification to the entire algorithms at the 
word level (i.e., checking if the entire combination of operators is correct or not) is 
very high. The domain space of their inputs can be extremely wide, and therefore, 
ensuring the correctness of the entire algorithm is almost impossible by any 
testing-based method. 

However, little research has ever been reported on the verification of arithmetic 
algorithms over Galois fields, although much research has been done for the other 
number systems such as integers (Presburger arithmetic etc.) [2,3], rational numbers 
[4], floating point numbers [5] and so on. A decision diagram for Galois fields, based 
on the decomposion of multiple-valued functions, has been proposed recently [6], but 
treating practical fields such as GF{2^), GF(2'®), GF{2^^) and larger and/or 
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treating practical fields such as GF(2^), GF(2^^), GF(2^^) and larger and/or 
algorithms (formulas) having many operators is still difficult. 

In this paper, we investigate how to efficiently prove the correctness of practical 
algorithms over Galois fields. For describing algorithm specifications and 
implementations at the word level, we have defined a logic system, GF’ 2 » -arithmetic, 
that is a subclass of first-order logic and which can describe non-linear and 
non-convex constraints. This logic can treat the bit level descriptions, too. 

We have performed a correctness proof of a key part of a practical (n,n 4) 
Reed-Solomon ECC decoding algorithm [7,8,9], over GF(2^) (m = 8) or larger. We 
have examined various decision techniques for the GF’ 2 " -arithmetic or its subclasses. 
Our first experimental result showed that even a small portion of the entire proof 
required too much CPU time (more than a day), if proof methods over GF(2) such as 
decision diagrams (DDs) [10,11] were used. By using a special decision procedure 
based on a combination of polynomial division and variable elimination over GF(2"‘), 
without using any costly techniques such as factoring [1] and proof methods over 
GF{2), the CPU time for the proof was significantly reduced to less than one second 
(Pentium III 800MHz), for any m > 3 and « > 5 . One of the reasons why we could 
shorten the proof time significantly is that the correctness criterion for the RS-ECC 
verification is an efficient subclass of the GF’ 2 "' -arithmetic: 2^ -field-size independent. 

This paper is organized as follows. In Section 2, we will define a logic system, 
GF’ 2 - -arithmetic. In Section 3, we will investigate various decision techniques for the 
GF’ 2 ». -arithmetic or its subclasses. In Section 4, we will show how we achieved a 
short verification time for a practical RS decoding algorithm. 

2 A Logic System for Describing Specifications and 

Implementations of Arithmetic Algorithms over Galois Fields 

2.1 The Language and Interpretation of GF’ 2 ». -Arithmetic 

We have defined a GF’ 2 -.. -arithmetic that is a subclass of the first-order logic. In any 
instance (sentence) of the GF’ 2 ™ -arithmetic, only arithmetic operators over Galois 
fields GF{2"‘) (-t, , x, h- ), equality (=) and logical operators (A, V, -■ ) can be used as 
functions or predicates. Please note that »i is a given constant value and is not a 
variable. Symbols used in expressions in the GF’ 2 ». -sentence are 
A,V,-.,V,3,=,-l-,x,-i', ,\,0,x,y,z,...,P,Q,R,. 

Definition of the language is as follows: 

1. GF 2 m variable and GF’ 2 -.. constant: x,y,z,... are GF’ 2 -.. variables. Unlike integers, 
every constant value in GF’(2'”) has two representations [1]. The first one is the 
exponentiation representation where an element of GF{2”') is represented as one of 
(0, 1, '}. Here, is a generator element of a given field and i is an integer constant 
(1 < / < 2"* 2). The second one is the vector representation where an element of 
GF{2'") is represented as a vector "Cm i,Cm 2 ,---,co" overGF(2). 

2. Term: Only variables, constants, Ti~\-T 2 , T\ T 2 , T 1 XT 2 , and T\-^T 2 are terms, 
if Ti and T 2 are terms. 

3. Atom: An expression of the form Ti = T 2 is an atom, where T\ and T 2 are terms. 
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4. GF 2 variable and GF 2 constant: 1 and 0 are GF 2 constants (they are also elements 
of GF2“ constants). P,Q,R , ... are GF 2 variables. 

5. Formula: Only atoms, GF 2 variables, GF 2 constants, Fi AF 2 ,Fi y F 2 , -^F\ , \/xF\ 
and 3xF\ are formulas, where F\ and F 2 are formulas and x is a GF’2” or GF 2 
variable. 

6. Sentence: A formula which has no free variable is a sentence. GF’2" -arithmetic is 
the set of all sentences. 

The interpretation of operators is defined once the field size m, irreducible 
polynomial and basis (polynomial basis, normal basis, etc. [1]) are fixed. The domain 
of GF 2 -n variables is the entire field and the domain of GF 2 variables is {1,0}. All of 
the above functions and predicates have their natural interpretations. In the following, 
we will use various standard abbreviations for simplicity: xy denotes xxy,tP denotes 
n{Li t , Fi => F2 denotes -iFi V F 2 , Fi F 2 denotes (Fi A F 2 ) V (-iF’i A -^ 2 ) , 
if F\ then F 2 else F3 denotes {F\ => F 2 ) A (-iF’i => Ff), and so on. 

Examples: 3xVy(x^ = x^y + x => 3z(x +y = z)) is a sentence. Neither 3x3y(x^ = 0) (a 
variable is used as the power for exponentiation) nor 3x(x = y) (y is a free variable) is 
a sentence. 

2.2 Various Useful Subelasses of the GF’2” -Arithmetic 

In most of the actual verification applications, as will be described in Section 4, only 
limited sentences in the following subclasses of the GF’2" -arithmetic appear. The 
relationships of the subclasses are shown in Fig.l. 

A. Basis Independent Sentences. 

These are sentences whose truth is independent of the representation basis. If a 
sentence contains no vector-constant (constant value in vector representation), then 
the sentence is basis independent. Usually, if a sentence contains multiplication by a 
vector-constant, it is basis dependent. 

B. Irreducible Polynomial Independent Sentences. 

These are sentences whose truth is independent of the irreducible polynomial. 
Usually, if a sentence contains addition by a constant value in an exponentiation 
representation, it is irreducible polynomial dependent. 

Regarding A and B above, the following interesting theorem holds. 

Theorem 1: If a sentence of the GF’2” -arithmetic contains no GF’2™ -constant other 
than 0 or I, then the sentence is basis and irreducible polynomial independent. □ 
(Proof) For any fields A and B whose sizes are the same, there is an isomorphism 
function S from the elements in A to the elements in B where S(x + y) = b(x) + d(y) , 
S(xy) = S(x)S(y), (5(1) = 1 and (5(0) = 0. Consider a sentence S over the field A. Then, 
for any atom term(x,y,z, ...) = 0 in S, a relation term(x,y,z, ...) = 0 
term(3(x),3(y),S(z ), ...) = 0 holds, if S contains no GF’2™ -constant other than 0 and 1. 
Therefore, the truth of S is the same over the field B, by substituting 3{x),3(y),3{z), . . . 
into x,y,z, ... . □ 
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Sentences of the GF2 ^m-arithmetic 



Basis 
independent 



Irreducible poly, 
independent 




Most of the arithmetic algorithms such as Reed-Solomon ECC 

Fig. 1. Relationships of subclasses A-E of the -Arithmetic 



C. 2P -Extension-Field-Size Independent Sentences. 

A sentence is 2^ -extension-field-size independent, if there exists a constant p>\ such 
that the truth of the sentence is the same over any extension fields of GF(2p), 
including GF{2P). For example, a sentence 3x(v'® =x is 2"*-extension- 

field-size independent. This sentence is true over GF{2‘^), GF’(2^^"*), GF(2^^"^)and so 
on. Clearly, any 2^ -extension-field-size independent sentence is both basis and 
irreducible polynomial independent. 

D. 2P -Field-Size Independent Sentences. 

This subclass is important because the correctness criteria for most of the arithmetic 
algorithms defined over Galois Fields are in this subclass. 

A sentence is 2^ -field-size independent, if there exists a constant p>\ such that the 
truth of the sentence is the same over all of GF{2‘) (i >p). For example, a sentence 
VxVyiy = 1 => x^ = xy) is 2^ -field-size independent. This sentence is false over 
GF{2^), GF{2^), GF{2^) and so on. 

E. Field-Size Independent Sentences. 

A 2 ' -field-size independent sentence is field-size independent. For example, a 
sentence \/x\/yiy = x => x^ = xy) is field-size independent. 

F. 3-only (or \/-only) Prenex Normal Form Sentences. 

One of the most important subclasses of the GF’ 2 - -arithmetic is the prenex normal 
form sentences where all of quantifiers are V . Any V -only prenex normal form 
sentence can be transformed into 3-only sentences, by using the relation 
\/xp(x) -i3x-i/?(x) . 
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3 Deciding the Truth Value of a Sentence of the GF 2 >n -Arithmetic 

3.1 Decidability of the GF 2 » -Arithmetic and the Problem of Deciding the 
Truth Value 

Unlike integers or rational numbers, the truth value of any sentences S of the GT’ 2 ” 
-arithmetic are decidable, even if S contains multiplications. This is because the 
domain of all the variables is finite. The truth of S can be determined by substituting 
all possible 2"* values into each variable in S. Clearly, the upper bound of 
computational complexity of this direct substitution method is 0(2""'^ ■ 2''^), where vi 
is the number of GT’ 2 " variables and V 2 is the number of GF 2 variables (please note 
that the lower bound is not yet known). Therefore, the development of some decision 
heuristics are necessary. 

However, in considering heuristics, we have found that interpreting both addition 
and multiplication at the same abstraction level can be a difficult problem. The reason 
is that interpreting multiplication over GF(2"‘) can be performed efficiently at the 
GF{2”') -level, which corresponds to the word level, using the exponential 
representation by adding the values of the exponents of the elements as natural 
numbers. On the other hand, interpreting addition over GF{2"') can be performed 
efficiently at the GF(2)-level, which corresponds to the bit level, using the vector 
representation by adding corresponding vector elements over GF{2) in parallel. If a 
sentence contains both addition and multiplication, converting the representation is 
necessary for evaluating the sentence, but conversion from the vector representation to 
the exponential representation is a discrete logarithm problem. The Zech logarithm is 
a known method to reduce the size of the addition-table, but only the reduction from 
m2^”' to m2”' is possible. 

3.2 Basic Structure of Decisiou Procedures for the GT’ 2 ” -Arithmetic 

Following the discussions in Section 3.1, the basic structure of the decision procedure 
for the GF’ 2 " -arithmetic is as follows: 

Step 1 (Decision over GF(2"')): Simplify a given sentence using the well-known 
mathematical theorems of operators over GF{2"') (Table 1). Standard techniques of 
theorem proving (term rewriting, case analysis, etc.) and standard expression 
transformation rules over propositional logic and first-order logic can be used, too. 
How to apply those techniques and rules is highly application specific, and an 
example will be discussed in Section 3.3. In many cases, the truth can not be 
completely determined in this Step 1 and if so, go to Step 2. 

If a given sentence is basis or irreducible polynomial dependent, the truth of the 
sentence cannot be determined using only the rules in Table 1, because all of the rules 
in Table 1 are both basis and irreducible polynomial independent transformations. 
More concretely, sentences that contain constant values other than 0 and 1 usually 
require Step 2. 

Step 2 (Decision over GF{2)): Apply the techniques in Section 3.4, or substitute all 
possible values into each variable and evaluate the truth of the sentence. When 
substitution is performed and m is large, avoiding conversion from vector 
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Table 1. Mathematical Theorems over GF{2"‘) 
(Basis and irreducible pol 5 momials independent rules). 



Field size independent 
rules 


Field size 

dependent 

rule 


A+B=BvA 


AxB=BxA 


Ax {B V C) = {Ax B) + {Ax Q 


A^"' =A 


A + {B + C) = (A+B) + C 


Ax{BxC) = (AxB)xC 


A^ + B^+--- = {A+B+--f 




A + 0=A 


4x1=4 


if« + 0,4+S = Co4 = BxC 




4+4 = 0 


4x0 = 0 


4 B=A+B 





representation to exponential representation by using the following steps will be good, 
even though each step may sometimes increase the size of the sentence rapidly: 

- Step 2a: Transform all of the atoms in the sentence into sum-of-product form. 

- Step 2b: Evaluate all multiplications over GF{2’^) using the exponential 

representation. 

- Step 2c: Convert all of the elements in the sentence from exponential 

representation to vector representation. 

- Step 2d: Evaluate all addition over GF{2) using the vector representation. 

3.3 A Step 1 Implementation for ECC Algorithm Verification 

In Step 1 in Section 3.2, the efficiency of deciding the truth value is very dependent 
on how a given sentence is evaluated. In the following, we show an example 
implementation of Step 1 as it will be used in Section 4.3.3. This example is suitable 
for word-level verification of an Reed-Solomon ECC, and is constructed so that Step 
2 is unnecessary when a complete (correct) ECC algorithm implementation is given. 

As will be shown in Section 4.2, in general, the correctness criteria of a key part of 
ECC algorithms have the following characteristics: 

- Described as V-only prenex normal sentences, because the correctness criteria 
mean that output of the algorithm is correct for any input value. 

- Often transformed into the form 3or\/ (term = variable Aformula) or 
3or\/(term ^ variable Aformula) . 

- Basis and irreducible polynomial independent because the correctness criteria 
satisfy Theorem 1 in Section 2.2. Therefore, Step 2 may be unnecessary. 

- 2^-field-size-independent (actual value of p is different for different ECC codes). 
Therefore, no field-size dependent rule from Table 1 may be required. 

Based on the above characteristics, we have implemented Step 1 as follows: 

Step la (Preparation): Transform the given sentence S into prenex normal form 
(usually unnecessary because the given sentence should already be in prenex formal 
form). Eliminate all -i by using De Morgan’s Law and transform all of the atoms into 
the form term = (F)Q, where term is in the sum-of-product form. Eliminate all division 
and unnecessary terms by using rules such as T\ = T 2 ^T^ T\xT 2 = T 2 (when 
Ts ^ 0 ) and T+ T= 0. 
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Step lb (Eliminate Variables): Try to transform all of the atoms into the form 
term T = variable V, where no V appears in T . If any atom is in the form 
{T\ + T\ + "’ + T\) = 0 , replace it by Ti + r 2 + • • • + = 0. If the form of the entire 

sentence S becomes 3or\/{T = V A formula), eliminate the variable V by substituting 
T into all occurrences of V in the sentence. After that, again transform all of the 
atoms into the form term = (^)O and then, the truth value of any atom that contains no 
variable can be determined. 

Step Ic (Factoring and Constrnct Zero/Non-zero Set): Try factoring the left side of 
all atoms term ^ 0 (this factoring can be sometimes omitted, as shown in Section 
4.3.3) [1]. If the entire sentence is of the form 3or\/{T\T2'"Tk*Q Aformula}, 
construct a set of non-zero terms {T\,T 2 ,...,Tk,...} (otherwise, <1>=(^). 
Similarly, if the sentence is of the form 3or\f {U = Q A formula} , construct a set of 
zero terms 'T = { C/, . . . } . 

Step Id (Division by Polynomials in Zero/Non-zero Set): If the non-zero set d> is 
not empty, test if the left side term of each atom can be factored by an element in fl) . 
If a term is factored by an element Ti, then replace the term by the quotient and 
append a new atom Ti ^ 0 to the entire sentence, connected by A. In the same 
manner, test by the zero-set T' and if a term is factored by an element in T', replace 
the term by 0. 

Step le (Transform Expression to Make a New Zero/Non-zero Set): Iterate Steps 
Ib-ld until no change occurs. If the truth value of the entire sentence has been 
determined, terminate the procedure. Otherwise, apply the techniques (i) and (ii) 
below. If the form of the entire sentence becomes 3or\/(term = variable Aformula) 
or 3or'^{term variable Aformula) as a result, then return to Step lb. If not, go to 
Step 2 in Section 3.2. 

(i) If 5 is a V -only (or 3 -only) prenex normal form sentence VxVyVz- • -fx,y,z, ...), 
make a corresponding new 3 -only (or V -only) sentence 3x3y3z- • --i f{x,y, z, . . .) . 

(ii) Case analysis by variable: select the innermost quantified variable V (or any 
variable, when the sentence is V -only or 3 -only prenex normal form) and partition the 
entire proof into a case K = 0 and another case 0 . 

3.4 General Verification-Cost Rednction Techniques for Step 2 

3.4.7. Field Change for Basis or Irreducible Polynomial Independent Sentences. 

If a sentence is basis or irreducible polynomial independent, we can select an 
appropriate basis or irreducible polynomial at the verification stage, even if a different 
field is actually used for the algorithm implementation. The cost of evaluating 
sentences can be reduced by changing the basis or irreducible polynomial, because the 
cost of a multiplication can be reduced from 0(m^) to 2m^ 1 or fewer operations 

over GF(2) [12]. 

3.4.2. Field Change for 2 p -(E xtension)-Field-Size Independent Sentences. 

If a sentence is 2^ -extension-field-size, 2^ -field-size or field-size independent, we can 
select a small field at the verification stage, for significant reduction of the variable 
domain space. 
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(i) Calc, syndrome, (ii) Calc, polynomials’ (ill) Eval. (iv) Modify 
\ coefficients. polynomials, errors.! 




Fig. 2. Block Diagram of the One Shot Reed-Solomon Decoding Algorithm 



3.4.3. Use of DDs for 3-onIy (or \/-onIy) Prenex Normal Form Sentences. 

For 3 -only (or V-only) prenex normal form sentences S over GF{2"‘), the truth value 
can be determined as follows: (i) extract the entire S into an equivalent sentence T 
over GF{2), by replacing all of the variables and operators by their bit-level 
implementations, (ii) transform T into a canonical form U over GF{2) and (hi) check 
if U is a constant 0 (or 1). This is the same method used in model checking over 
Boolean variables [13] and is usually much faster than the direct substitution method. 

4 Verification Example of a Reed-Solomon Decoding Algorithm 

4.1 An Ultrafast Reed-Solomon Decoding Algorithm 

The systematic {n,K) Reed-Solomon codes (RS codes), where n is the code word 
length and k is the information word (or message word) length, have a maximum 
error-correcting capability of [(« k)l2\ symbols. They are used in many areas, such 
as communication, storage, and fault-tolerant memory systems [7]. 

In [8,9], an ultrafast decoding algorithm for RS codes (one shot RS algorithm) was 
described. This one shot RS algorithm can exceed more than a Gbps of throughput 
when implemented in hardware. It is designed for combinational circuit 
implementation and does not use any iterative execution (loops). 

We have selected this algorithm for our verification example, because applying 
formal verification is highly desired. The algorithm contains many operators and 
many mistakes could happen during algorithm implementation. In addition, avoiding 
loops is preferable for applying a formal verification procedure, although almost the 
same correctness criterion and decision procedure can be applied to other RS 
algorithms by adding some proof mechanisms that can handle loops [14]. 
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Table 2. Correctness Criterion for («, n 4) RS Code (when # of errors = 2). 

On a given field (given field size m, irreducible poly and basis) , 

V sOVeO^elV slVaO'" (for any value of variables) 

( -- input constraints (mathematical def. of syndrome s0-s3) 

sO = eO + el and si = eO*aO + el*al 
and s2 = e0*a0'^2 + el*al'^2 and s3 = e0*a0'^3 + el*al'^3 
and eO /= 0 and el /= 0 -- error values are not 0 

and aO /= 0 and al /= 0 -- error locations are not 0 

and aO /= al -- error locations are different 

-- algorithm formula (compute 10-12 and erO-erl, from s0-s3) 
and 10e2 = s0*s2 + sl"^2 and lle2 = s0*s3 + sl*s2 

and 12e2 = sl*s3 + s2"^2 and llel = sO 

and 12el = si and 10 = 10e2 

and if 10e2 = 0 -- select an error locator polynomial by # of errors 
then 11 = llel and 12 = 12el 
else 11 = lle2 and 12 = 12e2 
endif 

and erO = ( (sl*10e2) /lle2) + sO and erl = ( (s0*10e2) /lle2) 

) imply ( 

-- output specification (correct polynomials are obtained) 

erl*al + erO = el and erl*a0 + erO = eO -- error value poly, 

and 10 /= 0 and 11/10 = aO + al and 12/10 = a0*al -- error loc poly. 

) 



As shown in Fig. 2, the one shot RS algorithm consists of four major blocks: (i) 
syndrome calculation, (ii) polynomials coefficients calculation, (iii) polynomial 
evaluation and (iv) error modification. The major difference between the one shot RS 
and the others is the computation sequence in the shaded block (ii). The most 
important and complicated part is also block (ii) and this is our verification target. 
The other blocks perform simple constant matrix multiplications [9], and checking if 
those matrices are correct or not is neither a difficult nor critical problem. 

4.2 Correctness Criterion of the Reed-Soiomon Decoding Algorithm 

The verified second block computes, from 4 syndrome values Sq,S\,S 2 and 5s , the 
coefficients of the error locator polynomial (/o, /land h) and those of the error 
polynomial (ero and eri ). In the block, multiple error locator polynomials and error 
value polynomials that correspond to different numbers of errors (from 1 to 
[(« k)!2\ ) are computed, and one of them is selected as a result of evaluating the 
number of errors that actually occurred. 

In Table 2, the correctness criterion for the second block for {n,n 4)RS code is 
shown. What we have proved is that, assuming the multipliers and adders are 
correctly implemented in the GF(2) -level (bit-level), the formula in the one shot 
decoding algorithm computes a correct output, if mathematically appropriate input is 
given. The criterion shown in Table 2 corresponds to the case when the number of 
errors is two, and it is necessary to independently prove a different case when the 
number of errors is one (the mathematical definitions of syndrome values are different 
between these cases). 

The criterion is of the form 

{{input constraints t\ algorithm formula} => output specification} 
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Table 3. Cost For proving Entire Correctness Criterion, over GF{2^) (Pentium III 800MHz). 



Proof Method 


CPU time 


Approach 1 : DD (OBDD or OFDD) 


> 1 week 


Approach 2: Variable elimination and DD 


> 1 day (mostly spent by DD) 


Approach 3: Polynomial division and variable elimintion 


0.2 second 



and this statement has to be true under a given field size m, irreducible polynomial and 
basis. All of identifiers between operators are GF 2 - variables. All of variables are 
bounded by V and the correctness criterion is a V-only prenex normal form sentence. 

The input constraints describe the standard mathematical definition of syndromes 
using the variables aO, al (error locations in the error word), eO and el (error 
values) [7]. The output specification can be easily extracted from standard 
descriptions of the RS ECC algorithms. This part means that the obtained error locator 
polynomial + /ix + /2 should be equivalent to /o(x ao)(x a i) and the obtained 
error value polynomial Er(x) = erix + ero should satisfy Er{ui) = e; (i = 0, 1 ). 

The same criterion can be used for any n>5,m>3, irreducible polynomial and 
basis. 

4.3 Experimental Results 

4.3.1 Approach 1: Direct Proof by DDs over GF{2). 

Because the correctness criterion is a V-only prenex normal form sentence, we first 
tried to decide the truth of the entire sentence by DDs (see Section 3.4.3). We 
examined shared-OBDD [10] and shared-OFDD [1 1]. We thought that FDD would 
be efficient, because multiplication over GF{2'") (Mastrovito multiplier [12]) is 
usually implemented as a positive polarity Reed-Muller formula (PPRM) over GF(2) 
[9,15]. However, too much proof time was necessary for those DDs (more than a 
week, when m = %) because too many variables appeared in the correctness criterion. 

4.3.2 Approach 2: Variable Elimination over GF{2'") and DDs over GF{2). 

To reduce the number of variables in the sentence, we tried a decision procedure that 
performs only Steps la, lb and le in Section 3.3. Using this procedure, the 
correctness criterion was transformed into a 3-only sentence 

3(term = variable A formula) , ■which should be false, and the variables sO, si, s2, 
s3, 10e2, lle2, 12e2, llel, 12el, erO and erl were replaced by terms 
that consist of eO, el, aO and al. However, because the if-sentence in Table 2 still 
remains as well as the variables 11 and 12 (in the then/else clause of the if-sentence), 
it was not yet possible to determine the truth value of the entire sentence. 

Therefore, we tried to evaluate the truth of the if-condition part which was rewritten 
as eoeiao + eoeiai = 0, by using DDs. More precisely, a simple 3-only sentence 
3eo3ei3ao3ai((eo,ei,ao,ai 4=OAao=^ai)=> eoeiao + eoeiai = 0) 
was evaluated separately. However, too much time was still necessary over GF(2^) 
(Tables 3,4). Assuming the sentence is field-size independent, we tried to change the 
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Table 4. Cost for Evaluating the If-condition Part Using BDD/FDD (P-III 800MHz). 



Field size and 


#of 


OBDD 




OFDD 




irreducible polynomial 


bits 


Max. BDD size 


CPU time (sec) 


Max. FDD size 


CPU time (sec) 


GF(2h,U+x+ 1 


8 


62 


< 1 


40 


< 1 


GF(2h.x^+x+ 1 


12 


276 


< 1 


175 


< 1 


GF(2‘‘),x''+x+l 


16 


1,145 


< 1 


871 


< 1 


GF{2A,x^+x^ + \ 


20 


4,820 


8 


6,784 


726 


GF(2'5),x«+x5 + l 


24 


18,520 


359 


> 50,000 


> 22,000 


GF(2’),x’+x+l 


28 


75,096 


14,277 


N/A 


N/A 



Table 5. Irreducible polynomial vs. max. BDD Size (over GF{2^)) (P-IIl 800MHz). 



Irreducible polynomial 


Max. BDD size 


CPU time (sec) 


x'^+x^ + 1 


18,520 


359 


x'^+x+ 1 


18,673 


364 


x^ +x^ +x^ +x^ + 1 


20,487 


439 


x‘^+x^+x‘‘+x+ 1 


20,558 


439 


x*’+x^+x^+x+l 


20,566 


459 


x‘^+x‘*+x^+x+ 1 


20,611 


469 



Table 6. Variable Ordering vs. max. BDD Size (over GF{2^)^ +x^ + 1 ) (P-IIl 800MHz). 



Variable ordering (from BDD top to leaf) 


Max. BDD size 


CPU time (sec) 


al(4)-al(0), a0(4)-a0(0), el(4)-el(0), e0(4)-e0(0) 


4,820 


8 


al(0)-al(4), a0(0)-a0(4), el(0)-el(4), e0(0)-e0(4) 


4,775 


8 


el(4)-el(0), e0(4)-e0(0), al(4)-al(0), a0(4)-a0(0) 


3,320 


10 


el(4)-el(0), al(4)-al(0), a0(4)-a0(0), e0(4)-e0(0) 


12,280 


57 


al(4)-al(0), el(4)-el(0), a0(4)-a0(0), e0(4)-e0(0) 


41,191 


355 



field size and proved over smaller fields, which was pretty effective for verification 
cost reduction (Table 4). However, we do not yet know any efficient formal method 
to prove if the sentence is (2^-)field-size independent or not. Proof over a smaller 
field at least increases the probability of algorithm correctness. 

We also examined changing the polynomial, because this sentence satisfies 
Theorem 1 in section 2.2, and it is a basis and irreducible polynomial independent 
sentence. As shown in Table 5, the proof time was slightly better when trinomials 
were used, because the fewer number of GF(2) operators are used in a multiplier [12]. 
We examined various variable orderings of DDs, and found it was better to place 
variables that appeared in the same atoms close together in the ordering (Table 6). 

4.3.3. Approach 3: Polynomial Division and Variable Elimination over GF(2"‘). 
Because proof methods over GF{2) were still too inefficient, we incorporated Steps 
Ic and Id into the decision procedure, in order to perform proof only over GF{2”'). 
As mentioned in Section 3.3, Step2 in Section 3.2 could be eliminated because the 
correctness criterion is basis and irreducible polynomial independent. 






476 



Sumio Morioka, Yasunao Katayama, and Toshiyuki Yamane 



In these additional steps, a non-zero set{aO, al, eO, el, aO+al} was created, 
the if-condition part was divided (factored) by the elements of this set, an atom 1 = 0 
was obtained, and finally the truth of the if-condition part was determined. Then, the 
variables 11 and 12 were eliminated, and the entire sentence was transformed into 
3{(7’^0)A'"A(7’=0)J . After appending T to the non-zero set and dividing the 
atom r = 0 by r , the truth of the entire sentence was determined. 

In addition, we have found that the factoring in Step Ic, that is considered to be the 
most costly step, can be omitted from the verification process of general 
Reed-Solomon decoders, if their input/output specifications are described as functions 
of 6i and a, . In a {n,n 2t) Reed-Solomon code, the mathematical formula that 
evaluates the number of errors is no<,<re/rio<i<^<i(ai+« 7 ) =0 [7], and this is 
logically equivalent to checking if the left term is factored by each element in 
{eo,...et \,ao + a\, 2 + at i}. All of these terms already exist in the correctness 

criterion, and therefore, the factoring stage can be omitted. Although general RS 
decoding algorithms are implemented as functions of syndrome, the above 
mathematical formula is obtained from the correctness criteria, after eliminating the 
syndrome variables that are defined by e,- and a,- . 

As a result, the proof time was shortened to 0.2 second (Pentiumlll 800 MHz). 
This verification cost is the same for any n > 5, m>3, irreducible polynomial and 
basis. The proof was fully mechanized. 

5 Conclusion 

In this paper we have defined a logic system for verification of arithmetic algorithms 
over Galois fields GF{2”'), and various proof techniques were investigated. We have 
carried out a correctness proof of a practical {n,n 4) Reed-Solomon decoding 
algorithm in less than one second for any n>5 and m>3, by using a decision 
procedure based on a combination of polynomial division and variable elimination 
over GT’(2'”). 

One of the reasons why we could shorten the proof time significantly is that the 
correctness criterion is 2^ -field-size independent. If any one of these conditions were 
not satisfied, achieving efficient verification would be much more difficult, because a 
decision over GF(2) would be necessary even if the sentence is written over GF{2"') . 

The verification of general {n,k) RS codes that can contain increasing numbers of 
operations can be done efficiently by using our approach. To the best of the authors’ 
knowledge, our work is the first investigation of verifying a practical arithmetic 
algorithm over GF(2"‘) within a reasonable proof time. 
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Abstract. In this paper we show how the classical job-shop scheduling 
problem can be modeled as a special class of acyclic timed automata. 
Finding an optimal schedule corresponds, then, to finding a shortest (in 
terms of elapsed time) path in the timed automaton. This representa- 
tion provides new techniques for solving the optimization problem and, 
more importantly, it allows to model naturally more complex dynamic 
resource allocation problems which are not captured so easily in tradi- 
tional models of operation research. We present several algorithms and 
heuristics for finding the shortest paths in timed automata and test their 
implementation in the tool Kronos on numerous benchmark examples. 



1 Introduction 

A significant part of verification consists in checking the existence of certain 
paths in very large transition graphs, given as a product (composition) of simpler 
graphs. Such paths correspond to bad behaviors of the system under considera- 
tion. On the other hand, in many application domains (optimal control, Markov 
decision processes, scheduling) we are interested in selecting, among the possi- 
ble behaviors, one that optimizes some more sophisticated performance measure 
(note that in “classical” verification we use a very simple performance measure 
on behaviors, namely, they are either “good” or “bad”). Both verification and 
optimization suffer from the state-explosion problem, also known as “the curse 
of dimensionality” , and various methods and heuristics have been developed in 
order to treat larger and larger problems. The main thrust of this work is to 
explore the possibility of exporting some of the ideas developed within the veri- 
fication community, such as symbolic analysis of timed automata, to the domain 
of optimal scheduling, where most of the effort was directed toward a constrained 
optimization approach. 

The observation underlying this paper is that classical scheduling and re- 
source allocation problems can be modeled very naturally using timed automata 
whose runs correspond to feasible schedules. In this case, finding a time-optimal 
schedule amounts to finding the shortest path (in terms of elapsed time) in 
the automaton. This problem can be solved by some modifications in verifica- 
tion tools for timed automata. Posing the problem in automata-theoretic terms 
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26270 VHS (Verification of Hybrid systems), and the AFIRST French-Israeli collab- 
oration project 970maefut5 (Hybrid Models of Industrial Plants). 
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might open the way to an alternative class of heuristics for intractable scheduling 
problems, coming from the experience of the verification community in analyz- 
ing large systems, and this might lead in the future to better algorithms for 
certain classes of scheduling problems. Even if they do not contribute to improv- 
ing the performance, automata-based models have a clear semantic advantage 
over optimization-based models as they can model problems of scheduling under 
uncertainty (in arrival time and duration of tasks) and suggest solutions in terms 
of dynamic schedulers that observe the evolution of the plant. 

Most of this work is devoted to establishing the link between the classical 
job-shop scheduling problem and timed automata and adapting the reachability 
algorithm of the tool Kronos to find shortest paths in timed automata. This is not 
a completely straightforward adaptation of standard graph-searching algorithms 
due to the density of the transition graph. We explore the performance limits 
of current timed automata technology, and although they cannot yet cope with 
the state-of-the-art in optimization, the results are rather encouraging. 

The rest of the paper is organized as follows. In section 2 we give a short 
introduction to the job-shop scheduling problem. In section 3 we recall the def- 
inition of timed automata and show how to transform a job-shop specification 
into an acyclic timed automaton whose runs correspond to feasible schedules. 
In section 4 we describe several algorithms for solving the shortest-path prob- 
lem for such timed automata (either exactly or approximately) and report the 
performance results of their implementation numerous benchmark examples. 



2 Job-Shop Scheduling 

The Job-shop scheduling problem is a generic resource allocation problem in 
which common resources (“machines”) are required at various time points (and 
for given durations) by different tasks. The goal is to find a way to allocate 
the resources such that all the tasks terminate as soon as possible (or “minimal 
makespan” in the scheduling jargon). We consider throughout the paper a fixed 
set M of resources. Intuitively, a step is a pair (m, d) where m G M and d G N, 
indicating the required utilization of resource m for time duration d. A job 
specification is a finite sequence 

J = (mi, di), (m 2 , d 2 ), . . . , (m^, dk) (1) 

of steps, stating that in order the accomplish job J, one needs to use machine mi 
for di time, then use machine m 2 for d 2 time, etc. The formal definition below 
tries to optimize the notations for the sequel. 

Definition 1 (Job-Shop Specification). Let M he a finite set of resources 
(machines). A job specification over a set M of resources is a triple J = {k,p,,d) 
where /c G N is the number of steps in J, p, : {l../c} ^ M indicates which resource 
is used at each step, and d : {1..A:} ^ N specifies the length of each step. A job- 
shop specification is a set J = { . . . , J"} of jobs with J* = (A:*, fA, d®). 
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We make the following assumptions: 1) A job can wait an arbitrary amount of 
time between two steps. 2) Once a job starts to use a machine, it is not preempted 
until the step terminates. 3) Each machine is used exactly once by every jobO 
We denote R+ by T, abuse J for {1, . . . , n} and let AT = {1, . . . , A:}. 

Definition 2 (Feasible Schedules). Let J = { J^, . . . , J"} be a job-shop spee- 
ification. A feasible schedule for ff is a relation S C ff x K xT so that (z, j, t) G S 
indicates that job J* is busy doing its step at time t and, hence, occupies ma- 
chine pd{j). A feasible schedule should satisfy the following conditions: 

1. Ordering.- z/ (z,j, t) G 5 and {i, j' ,t') G S then j < j' implies t < t' (steps of 
the same job are executed in order). 

2. Covering and Non-Preemption.- For every i G J and j G K , the set {t : 
(i,j,t) G S} is a non-empty set of the form [r,r -\- d\ for some r G T and 
d > d^{j) (every step is executed continuously until completion)^ 

3. Mutual Exclusion; For every i, i' G J , j, j' G K and t G T, if (i, j, t) G S and 
{i',j',t) G S then yd{j) yf yd {j') (two steps of different jobs which execute 
at the same time do not use the same machine). 

The length IS"! of a schedule is the maximal t over all (z, j, t) G S. The optimal job- 
shop scheduling problem is to find a schedule of a minimal length. This problem 
is known to be NP-hard. From the relational definition of schedules one can 
derive the following commonly used definitions: 

1. The machine allocation function a \ M xT ^ J stating which job occupies 
a machine at any time, defined as a{m,f) = z if (i,j,t) G S and pd{j) = m. 

2. The task progress function (3 : ff xT ^ M stating what machine is used by 
a job is at a given time, defined as (3{i, t) = m if {i,j, t) G S and yd{j) = m. 

These functions are partial — a machine or a job might be idle at certain times. 
Example 1: Consider M = {mi,m 2 } and two jobs = (toi, 4), (m 2 , 5) and 
j2 = (mi, 3). Two schedules S\ and S 2 appear in Figured The length of is 
9 and it is the optimal schedule. 

We conclude this section with an observation concerning optimal schedules 
which will be used later. We say that a schedule S exhibits laziness at step j of 
job z if immediately before starting that step there is an interval in which both 
the job and the corresponding resource are idle. For example in the schedule S 
of FigureEl there is a laziness at (2, 1). In the job-shop setting, where there are 
no logical dependencies among the jobs, such idling is of no use. Note that a 
waiting period which is not adjacent to the beginning of the step, e.g. step (3, 1) 
of the same schedule, is not considered as laziness. 

Definition 3 (Lazy Schedules). Let S be a schedule, let i be a job and j a 
step with yd{j) = m which starts at time t. We say that S exhibits laziness at 
(i,j) if there is a time r < t such that for every t' G [r,t), (3(1, t') = T and for 
every i' yf i, f3{i' ,t') ^ m. A schedule S is non-lazy if it exhibits no laziness. 

^ This assumption simplifies the presentation but maintains the inherent complexity. 
^ Note that we allow a job to occupy the machine after the step has terminated. This 
helps in simplifying the timed automata but has no effect on the optimal solution. 
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Fig. 1. Two schedule S\ and S 2 visualized as the machine allocation function 
a and the task progress function (3. 

Claim 1 (Non-lazy Optimal Schedules) Every lazy schedule S can he trans- 
formed into a non-lazy schedule S with [S'! < [S'!. Hence every job-shop specifi- 
cation admits an optimal non-lazy schedule. 



Sketch of Proof: The proof is by taking a lazy schedule S and transforming 
it into a schedule S' were laziness occurs “later”. A schedule defines a par- 
tial order relation ^ on x K which is generated by the ordering constraints 
of each job, {i,j) A {i,j + 1), and by the choices made in the case of con- 
flicts, (i,j) A {i',f) if /i*(j) = (/) and (i,j) precedes (*',/) in S. The 

laziness elimination procedure picks a lazy step (i,j) which is minimal with 
respect to A and shifts its start time backward to t' , to yield a new sched- 
ule S", such that |S"| < IS”!. Moreover, the partial order associated with S' is 
identical to the one induced by S. The laziness at (j,j) is thus eliminated, and 
this might create new manifestations of laziness at later steps which are elim- 
inated in the subsequent stages of the procedure (see illustration in Figure E|). 
Let L{S) C J' X AT be the set of steps that are not preceded by laziness, namely 
L{S) = {(z,j) : '^{i' ,f) ^ (i,j) there is no laziness in {i',j')}. Clearly the lazi- 
ness removal procedure increases L(S) and terminates due to finiteness. j 



mi m2 \ mi m2 \ mi m2 




Fig. 2. Removing laziness from a schedule S: first we eliminate laziness at (2, 1) 
and create new ones at (2,2) and (3,1) in S', and those are further removed 
until a non-lazy schedule S is obtained. The dashed line indicates the frontier 
between L{S) and the rest of the steps. 
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3 Timed Automata 

Timed automata are automata augmented with continuous clock vari- 

ables whose values grow uniformly at every state. Clocks are reset to zero at 
certain transitions and tests on their values are used as pre-conditions for tran- 
sitions. Hence they are ideal for describing concurrent time-dependent behaviors. 

Definition 4 (Timed Automaton). A timed automaton is a tuple A = 
{Q,C, s, f, A) where Q is a finite set of states, C is a finite set of clocks, and 
A is a transition relation consisting of elements of the form p,q') where q 
and q' are states, p A C and (j) (the transition guard) is a boolean combination of 
formulae of the form (c G I) for some clock c and some integer -bounded interval 
I . States s and f are the initial and final states, respectively. 

A clock valuation is a function v : C — *■ M+ U {0}, or equivalently a \C\- 
dimensional vector over R+. We denote the set of all clock valuations by A 
configuration of the automaton is hence a pair (q,v) G Q x H consisting of a 
discrete state (sometimes called “location”) and a clock valuation. Every subset 
p C C induces a reset function Resetp defined for every clock valuation 

V and every clock variable c S C as 

Resetp v(c) = | ^ ^ ^ ^ 

P ^ > ( v(c) if c ^ p 

That is, Resetp resets to zero all the clocks in p and leaves the others unchanged. 
We use 1 to denote the unit vector (!,...,!) and 0 for the zero vector. 

A step of the automaton is one of the following: 

— A discrete step: (g,v) (g',v'), where there exists S = {q,(j),p,q') G A, 

such that V satisfies 4> and v' = Resetp (v). 

— A time step: {q, v) — ^ {q, v -1- tl), t G K+. 

A run of the automaton starting from (goWo) is a finite sequence of steps 

(goWo) ^ (<7i,vi) ^(gri.Vn). 

The logical length of such a run is n and its metric length is |^| = t\ -1-^2 + • • ' + tn- 
Note that discrete transitions take no time. 

A lazy run is a run containing a fragment 

(9,v) (9,v + i) ( 9 'wO 

where the transition taken at (g, v -|- 1) is enabled already at (g, v -|- t') for some 
t' < t. In a non-lazy run whenever a transition is taken from a state, it is taken 
at the earliest possible time. Clearly, from any given configuration there are only 
finitely many non-lazy continuations and hence for every k there are only finitely 
many non-lazy runs with k steps. 

Next we construct for every job J = (k, p, d) a timed automaton with one 
clock such that for every step j such that p{j) = m there will be two states: a 



Job-Shop Scheduling Using Timed Automata 483 



state m which indicates that the job is waiting to start the step and a state m 
indicating that the job is executing the step. Upon entering m the clock is reset 
to zero, and the automaton can leave the state only after time d{j) has elapsed. 
Let M = {m : m G M} and let JI : K ^ M he an auxiliary function such 
that Jl{j) = m whenever /r(j) = m. Note that the clock c is inactive at state m 
because it is reset to zero without being tested upon leaving m. 

Definition 5 (Timed Automaton for a Job). Let J = be job. Its 

associated timed automaton is A = {Q, {c}, A, s, f) with Q = P U P U {/} 
where P = . . . /j,(/c)}, and P = . . . ,7i'(n)}. The transition relation 

A consists of the following tuples 

{Jl{j),true,{c},p.{j)) j = l..k 

d{j),%,-p{j + 1)) j = l..k- 1 
{p.{k),c> d{k),ttl,f) 

The initial state is ^(1). 

The automata for the two jobs in Example 1 are depicted in Figured 

For every automaton A we define a ranking function g : Q x K_|_ — > K_|_ such 
that g{q, v) is a lower-bound on the time remaining until / is reached from (g, v): 

g{f,v) = 0 

g{Ti{j),v) = Y!l^jd{l) 

g{Kj)^v) =g{jl{j),v) -min{u,fi(j)} 






C2 ■■= 0 






Fig. 3. The automata corresponding to the jobs = (mi, 4), (m 2 , 5) and = 
(mi, 3). 

In order to obtain the timed automaton representing the whole job-shop 
specification we need to compose the automata for the individual tasks. The 
composition is rather standard, the only particular feature is the enforcement of 
mutual exclusion constraints by forbidding global states in which two or more 
automata are in a state corresponding to the same resource m. An n-tuple 
q = (g^, . . . ,g”) G (M U M U {/})” is said to be conflicting if it contains two 
components g“ and q^ such that q°‘ = q^ = m G M. 

Definition 6 (Mutual Exclusion Composition). Let J = {J^,..., J"} be 

a job-shop specification and let A* = (Q’' , , A'^ , s"^ , f^) be the automaton cor- 

responding to each P . Their mutual exclusion composition is the automaton 
A = (Q, C, Z\, s, /) such that Q is the restriction ofQ^x... Q" to non- conflicting 
states, C = U . . . U C", s = (s^, . . . , s"), / = (/^, . . . , /") and the transition 
relation A contains all the tuples of the form 

((g\ . . . , g“, . . . , g"), (/), p, (g\ . . . ,p“, . . . , g")) 
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such that (g“, 4>, p,p°“) G A°- for some a and the global states (g^, . . . , , g") 

and (g^, . . . ,p“, . . . , g") are non- conflicting. 

The composition to the two automata of Figure El appears in Figure 0 

A run of A is complete if it starts at (s, 0) and the last step is a transition 
to /. From every complete run ^ one can derive in an obvious way a schedule 
relation 5'j such that {i,j, t) G if at time t the component of the automaton 
is at state p{j). The length of coincides with the metric length of f. 

Claim 2 (Runs and Schedules) Let A be the automaton generated for the 
job-shop specification J according to Definitions 1 and 2. Then: 

1. For every complete run f of A, its associated schedule is feasible for J . 

2. For every feasible schedule S for J there is a run f of A such that = S. 
Moreover, if S is non-lazy so is f. 

Note that non-laziness of the run does not imply non-laziness of the schedule. 

Corollary 1 (Job-Shop Scheduling and Timed Automata). The optimal 
job-shop scheduling problem can be reduced to the problem of finding the shortest 
non-lazy path in a timed automaton. 




Fig. 4. The global timed automaton for the two jobs. 

The two schedules appearing in Figure 0 correspond to the following two 
runs (we use notation T to indicate inactive clocks): 



51 : 

(mi, mi , _L, _L) 4L. (mi , mi , 0, _L) -A (mi , mi , 4, _L) -m (m 2 , mi , J_, _L) -m (m 2 , mi , 0, _L) 
(m 2 , mi, 0,0) -A (m 2 , mi, 3, 3) -2- (m2,/,3,_L) (m2,/,5,_L) -2- (/,/,-L,-L) 

52 i 

(Tfri,mi,J_, J_) -2- {mi,mi,_l,0) -A {mi,mi,J_,3) -2- (mi, /,_!,_!) -2- (mi,/,0,_L) 
(mi,/,4, J_) -2- (m 2 ,/.-L, J-) -2- (m 2 , 0, _L) -5- (m2,/,5,_L) -2- (/,/, J_, _L) 
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Some words are in order to describe the structure of the job-shop timed au- 
tomaton. First, it is an acyclic automaton and its state-space admits a natural 
partial-order. It can be partitioned into levels according to the number of dis- 
crete transitions from s to the state. All transitions indicate either a component 
moving from an active to an inactive state (these are guarded by conditions of 
the form Ci > d), or a component moving into an active state (these are labeled 
by resets Cj := 0). There are no staying conditions (invariants) and the automa- 
ton can stay forever in any given state. Recall that in a timed automaton, the 
transition graph might be misleading, because two or more transitions entering 
the same discrete state, e.g. transitions to (m2, f) in Figure 0, might enter it 
with different clock valuations, and hence lead to different continuations. Con- 
sequently, algorithms for verification and quantitative analysis might need to 
explore all the nodes in the unfolding of the automaton into a tree. Two tran- 
sitions outgoing from the same state might represent a choice of the scheduler, 
for example, the two transitions outgoing from the initial state represent the 
decision to whom to give first the resource mi. On the other hand some dupli- 
cation of paths are just artifacts due to interleaving, for example, the two paths 
outgoing from (fn2,mi) to (m2, mi) are practically equivalent. 

Another useful observation is that from every job-shop specification one 
can construct its reverse problem where the order of every individual job is 
reversed. Every feasible schedule for J' can be transformed easily into a feasible 
schedule for J having the same length. Doing a forward search on the automaton 
for J' is thus equivalent to doing a backward search on the automaton for J . 



4 Shortest Paths in Timed Automata 



In this section we describe how the symbolic forward reachability algorithm 
of Kronos is adapted to find a shortest path in a job-shop timed automaton. 
Although Corollary [Hallows us to use enumerative methods in the case of deter- 
ministic job-shop problems, we start with algorithms that do not take advantage 
of non-laziness, both for the completeness of the presentation and as a prepa- 
ration for more complex scheduling problems where non-laziness results do not 
hold. Standard shortest-path algorithms operate on discrete graphs with numer- 
ical weights assigned to their edges. The transition graphs of timed automata 
are non-countable and hence not amenable to enumerative algorithmsEI 

We recall some commonly-used definitions concerning timed automata. A 
zone is a subset of TL consisting of points satisfying a conjunction of inequalities 
of the form Cj — Cj > d or Ci > d. A symbolic state is a pair (q, Z) where g is a 
discrete state and Z is a zone. It denotes the set of configurations {(q,z) : z S Z}. 
Symbolic states are closed under the following operations: 



® One can, of course, discretize time into unit steps but this will cause an enormous 
increase in the state-space of the automaton. 
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~ The time successor of {q, Z) is the set of configurations which are reachable 
from {q,Z) by letting time progress: 

Post*{q, Z) = {(g, z + rl) : z G Z,r > 0}. 

We say that {q, Z) is time-closed if {q, Z) = Post*{q, Z). 

— The 5-transition successor of (g, Z) is the set of configurations reachable 
from {q, Z) by taking the transition 5 = {q, 4>, p, q') € A: 

Post^{q, Z) = {(g',Resetp(z)) : z G Z H (p}. 

— The 5-successor of a time-closed symbolic state {q^Z) is the set of configu- 
rations reachable by a i5-transition followed by passage of time: 

Succ^ {q, Z) = Post*{Post^ {q, Z)). 

— The successors of {q, Z) is the set of all its (5-successors: 

Succ{q,Z)= {Succ^ {q, Z)). 

SeA 

To compute all the reachable configurations of the job-shop automaton we use 
a variant of the standard forward reachability algorithm for timed automata, 
specialized for acyclic graphs. 

Algorithm 1 (Forward Reachability for Acyclic Timed Automata) 

Waiting :={Post*{s, 0)}; 
while Waiting yf 0; do 
Pick {q, Z) G Waiting; 

For every {q',Z') G Succ{q,Z); 

Insert {q' , Z') into Waiting; 

Remove {q, Z) from Waiting 
end 

This algorithm solves the reachability problem for timed automata — a trivial 
problem for job-shop automata since all complete runs lead to /. Its adaptation 
for finding shortest paths is rather straightforward. All we do is to use a clock- 
space H' which is the clock-space of A augmented with an additional clock c„+i 
which is never reset. For any symbolic state {q, Z) reachable in the modified 
automaton A', if (wi, . . . , Vn, Vn+i) G Z then {q, (vi, . . . , Vn)) is reachable in A 
within any time t > Vn+i- Consequently, the length of the shortest run from the 
initial state to q via the (qualitative) path which generated (q, Z) is 

G{q, Z) = min{r;„+i : {vi,...,Vn, fn+i) G Z) 

and the length of the optimal schedule is 

min{G(/, Z) : (/, Z) is reachable in A!}. 



Hence, running Algorithm 1 on A' is guaranteed to find the minimal schedule. 
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The rest of the section is devoted to several improvements of this algorithm, 
whose naive implementation will generate a symbolic state for almost every node 
in the unfolding of the automaton. Experimental results appear in Table E 
Inclusion Test: This is a common method used in Kronos for reducing the 
number of symbolic states in verification. It is based on the fact that Z Q Z' 
implies Succ^{q,Z) C Succ^{q, Z') for every S G A. Hence, whenever a new 
symbolic state (g, Z) is generated, it is compared with any other (g, Z') in the 
waiting list: if Z C Z' then (g, Z) is not inserted and if Z' C Z, (g, Z') is removed 
from the list. Note that allowing the automaton to stay indefinitely in any state 
makes the explored zones “upward-closed” with respect to absolute time and 
increases significantly the effectiveness of the inclusion test. 

Domination Test: The inclusion test removes a symbolic state only if all its 
successors are included in those of another symbolic state. Since we are interested 
only in optimal runs, we can apply stronger reductions that do not preserve all 
runs, but still preserve the optimal ones. As an example consider an automaton 
with two paths leading from the initial state to a state g, one by first resetting 
Cl and then C2 and one in the reverse order. The zones reachable via these paths 
are = C3 > ci > C2 > 0 and Z2 = C3 > C2 > c\ > 0, where C3 is the additional 
clock which measures absolute time. These zones are incomparable with respect 
to inclusion, however, for every t they share a “maximal” point which 

corresponds to the respective non-lazy runs along each of the paths. Hence it is 
sufficient to explore only one of the symbolic states (g, Zi) and (g, Z2). 

Let (g, (v,f)) and (g, (v',f')) be two reachable configurations in Q x Ti.' . We 
say that (v, f) dominates (v',f') if t < t' and v > v'. Intuitively this means 
that (g, v) was reached not later than (g, v') and with larger clock values, which 
implies that steps active at g started earlier along the run to (g, v) and hence 
can terminate earlier. It can be shown that for every reachable symbolic state 
(q,Z), Z contains an optimal point (v*,f*) dominating every other point in Z. 
This point, which is reachable via a non-lazy run, can be computed by letting 
t* = G{q, Z) (earliest arrival time) and v* = (uj, . . . , u* ) where for every i, 

V* = : (t>i, . . . ,Vi, . . . ,Vn,t*) G Z}. 

We say that Zi dominates Z2 if (v);,t*) dominates We apply the domi- 

nation test in the same manner as the inclusion test to obtain a further reduction 
of the number of symbolic states explored. 

Best-First Search: The next improvement consists in using a more intelligent 
search order than breadth-first. To this end we define an evaluation function 
E : Q X Tl' ^ M+ for estimating the quality of configurations and symbolic 
states: 

E{{qi , . . . , g„), {vi, ...,Vn,t))=t + max{g*(< 7 i, 

where g® is the previously-defined ranking function associated with each automa- 
ton A*. Note that maxjg*} gives the most optimistic estimation of the remaining 
time, assuming that no job will have to wait. The extension of this function to 
zones is E{q, Z) = E{q, It is not hard to see that E{q, Z) gives a lower 

bound on the length of every complete run which passes through (g, Z). 



488 



Yasmina Abdeddaim and Oded Maler 



Table 1. The results for n jobs with 4 tasks. Columns #j, #ds and #tree show, 
respectively, the number of jobs, the number of discrete states in the automaton 
and the number of different reachable symbolic states (which is close to the 
number of nodes in the unfolding of the automaton into a tree) . The rest of the 
table shows the performance, in terms of the number of explored symbolic states 
and time (in seconds), of algorithms employing, progressively, the inclusion test, 
the domination test, and the best-first search (m.o. indicates memory overflow). 



Problem size 


Inclusion 


Domination 


Best-first 


#j 


#ds 


#tree 


#s 


time 


#s 


time 


#s 


time 


2 


77 


632 


212 


1 


100 


1 


38 


1 


3 


629 


67298 


5469 


2 


1143 


1 


384 


1 


4 


4929 


279146 


159994 


126 


11383 


2 


1561 


1 


5 


37225 


m.o. 


m.o. 


m.o. 


116975 


88 


2810 


1 


6 


272125 


m.o. 


m.o. 


m.o. 


1105981 


4791 


32423 


6 



The modified algorithm now orders the waiting list of symbolic states ac- 
cording to their evaluation (and applies the inclusion and domination tests upon 
insertion to the list). This algorithm is guaranteed to produce the optimal path 
because it stops the exploration only when it is clear that the unexplored states 
cannot lead to schedules better than those found so far. 

Algorithm 2 (Best-First Forward Reachability) 

Waiting :={Post*{s, 0)}; 

Best:=oo 

{q, Z):= first in Waiting; 
while Best > E{q,Z) 

do 

For every {q',Z') G Succ{q, Z); 
if g' = / then 
Best:=mm{Best,E{q' , Z')} 
else 

Insert {q' ^ Z') into Waiting; 

Remove (g, Z) from Waiting 
(g, Z):= first in Waiting; 

end 

We have implemented these techniques into Kronos and tested them first on a 
family of problems consisting of n jobs, n = 2, . . . , 6, each with 4 steps0 We also 
make use of Kronos’ capability to handle zones of varying dimensionality, were 
only active clocks are considered ITWTTni . The results, obtained on a Pentium 
P3, 666 MHz under Linux, with memory restricted to 512MB, are depicted in 
Tabled One can see that the number of symbolic states explored by the best-first 

^ The problems can be found in http://www-verimag.imag.fr/~maler/jobshop 
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algorithm is smaller than the number of discrete states in the timed automaton. 
Nevertheless the combinatorial nature of the problem cannot be avoided. 
Points instead of Zones: Following CorollaryQ], an optimal run can be found 
among the non-lazy runs and the search can be restricted to explore only such 
runs. This search can be performed without using zones, but rather using single 
points in the clock space (which are exactly the dominating points of the reach- 
able zones). This reduces significantly memory usage (0(n) per symbolic state 
instead of 0(n^)) and simplifies the operations. 

Sub-optimal Solutions: In order to treat larger problems we abandon opti- 
mality and use a heuristic algorithm which can quickly generate sub-optimal 
solutions. The algorithm is a mixture of breadth- first and best-first search with 
a fixed number w of explored nodes at any level of the automaton. For every 
level we take the w best (according to E) symbolic states, generate their succes- 
sors but explore only the best w among them, and so on. The number w is the 
main parameter of this technique, and although the number of explored states 
grows monotonically with w, the quality of the solution does not — sometimes 
the solution found with a smaller w is better than the one found with a larger 
w. 

In order to test this heuristics we took 10 problems among the most notorious 
job-shop scheduling problems 0 Note that these are pathological problems with 
a large variability in step durations, constructed to demonstrate the hardness of 
job-shop scheduling. For each of these problems we have applied our algorithms 
for different choices of w, both forward and backward. In Table 0 we compare 
our best results on these problems with the results reported in Table 15 of the 
recent survey \mm . where the the 18 best-known methods were compared. 
In order to appreciate the difficulty, we also compare our results with the best 
results among 3000 randomly-generated solutions for each of the problems. 

5 Related Work 

This work can be viewed in the context of extending verification methodology 
in two orthogonal directions: from verification to synthesis and from qualitative 
to quantitative evaluation of behaviors. In verification we check the existence of 
certain paths in a given automaton, while in synthesis we have an automaton 
in which not all design choices have been made and we can remove transitions 
(and hence make the necessary choices) so that a property is satisfied. If we add 
a quantitative dimension (in this case, the duration of the path), verification is 
transformed to the evaluation of the worst performance measure over all paths, 
and synthesis into the restriction of the automaton to one or more optimal paths. 

The idea of applying synthesis to timed automata was first explored in 
|WH9^ . An algorithm for safety controller synthesis for timed automata, based 
on operation on zones was first reported in and later in lAIVim . 

where an example of a simple scheduler was given, and in |AMPS9^ . This al- 
gorithm is a generalization of the verification algorithm for timed automata 
fHlN S Y 94IA( 1D93] used in Kronos |Y97lfjDM~*~98| . In these and other works on 

® The problems are taken from ftp://mscmga.ms.ic.ac.uk/pub/jobshopl.txt. 
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Table 2. The results for 10 hard problems using the bounded width heuristic. 
The first three columns give the problem name, no. of jobs and no. of machines 
(and steps). Our results (time in seconds, the length of the best schedule found 
and its deviation from the optimum) appear next, followed by the best out 
of 3000 randomly-generated solutions and by the best known result for each 
problem. 



problem 


Kronos 


Rand 


Opt 


name 


#J 


#m 


time 


length 


deviation 


length 


deviation 


length 


FTIO 


10 


10 


13 


982 


5.59 % 


1761 


89.35 % 


930 


LA02 


10 


5 


1 


655 


0.00 % 


1059 


61.68 % 


655 


LA19 


10 


10 


12 


885 


5.11 % 


1612 


91.45 % 


842 


LA21 


10 


15 


178 


1114 


6.50 % 


2339 


123.61 % 


1046 


LA24 


10 


15 


186 


992 


5.98 % 


2100 


124.00 % 


936 


LA25 


10 


15 


180 


1041 


6.55 % 


2209 


126.10 % 


977 


LA27 


10 


20 


6 


1343 


8.74 % 


2809 


127.45 % 


1235 


LA29 


10 


20 


193 


1295 


12.41 % 


2713 


135.50 % 


1152 


LA36 


15 


15 


16 


1391 


9.70 % 


2967 


133.90 % 


1268 


LA37 


15 


15 


72 


1489 


6.59 % 


3188 


128.20 % 


1397 



treating scheduling problems as synthesis problems for timed automata, such 
as lAOPfibl . the emphasis was on yes/no properties, such as the existence of a 
feasible schedule, in the presence of an uncontrolled adversary. 

A transition toward quantitative evaluation criteria was made already in 
where timed automata were used to compute bounds on delays in real- 
time systems and in |CCM~*~9^ where variants of shortest-path problems were 
solved on a timed model much weaker than timed automata. To our knowledge, 
the first quantitative synthesis work on timed automata was mm in which 
the following problem has been solved: “given a timed automaton with both 
controlled and uncontrolled transitions, restrict the automaton in a way that 
from each configuration the worst-case time to reach a target state is minimal” . 
If there is no adversary, this problem corresponds to finding the shortest path. 
Due to the presence of an adversary, the solution in [AM99j employs backward- 
computation (dynamic programming), i.e. an iterative computation of a function 
h : Q X Ti. ^ M+ such that h{q, v) indicates the minimal time for reaching the 
target state from (g,v). The implementation of the forward algorithm used in 
this paper can be viewed as iterating with a function h such that h(q, v) indicates 
the minimal time to reach (g, v) from the initial state. The reachable states in 
the augmented clock-space are nothing but a relational representation of h. 

Around the same time, in the framework of the VHS (Verification of Hybrid 
systems) project, a simplified model of a steel plant was presented as a case-study 
lESSSj. The model had more features than the job-shop scheduling problem 
such as upper-bounds on the time between steps, transportation problems, etc. 
A. Fehnker proposed a timed automaton model of this plant from which feasible 
schedules could be extracted |F99j . This work inspired us to find a systematic 
connection between classical scheduling problems and timed automata 
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upon which this paper is based. Another work in this direction was concerned 
with another VHS case-study, a cyclic experimental batch plant at Dortmund 
for which an optimal dynamic scheduler was derived in jNYflflj . 

The idea of using heuristic search is useful not only for shortest-path prob- 
lems but for verification of timed automata (and verification in general) where 
some evaluation function can guide the search tow ard the target goal. These 
possibilities were investigated recently in Ibfh+ 01 ^ on several classes of exam- 
ples, including job-shop scheduling problems, where various search procedures 
and heuristics were explored and compared. 

In |IN TYDD] it was shown that in order to find shortest paths in a timed 
automaton, it is sufficient to look at acyclic sequences of symbolic states (a 
fact that we do not need due to the acyclicity of job-shop automata) and an 
algorithms based on forward reachability was introd uced. A rec ent generalization 
of the shortest path problem was investigated by fRFH+flfhj and EIEn]. In 
this model there is a different price for staying in any state and the total cost 
associated with the run progresses in different slopes along the path. It has been 
proved that the problem of finding the path with the minimal cost is computable. 



6 Conclusions 

We have suggested a novel application of timed automata, namely for solving 
job-shop scheduling problems. We believe that the insight gained from this point 
of view will contribute both to scheduling and to the study of timed automata. 
We have demonstrated that the performance of automata-based methods is not 
inferior to other methods developed within the last three decades. There are 
still many potential improvements to be explored such as the application of 
partial-order methods, more symbolic representation of the discrete states, new 
heuristics, etc. The most interesting challenge is to adapt these techniques for 
more complex scheduling situation such as those involving uncertainty or logical 
dependencies among tasks. 
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Abstract. In this paper we present an algorithm for efficiently comput- 
ing optimal cost of reaching a goal state in the model of Linearly Priced 
Timed Automata (LPTA). The central contribution of this paper is a 
priced extension of so-called zones. This, together with a notion of facets 
of a zone, allows the entire machinery for symbolic reachability for timed 
automata in terms of zones to be lifted to cost-optimal reachability using 
priced zones. We report on experiments with a cost-optimizing extension 
of Uppaal on a number of examples. 



1 Introduction 



Well-known formal verific ation tools for real-time and hybrid systems, such as 
Uppaal Kronos and HyTech use symbolic tech- 

niques to deal with the infinite state spaces that are caused by the presence of 
continuous variables in the associated verification models. However, symbolic 
model checkers still share the “state space explosion problem” with their non- 
symbolic counterparts as the major obstacle for their application to non-trivial 
problems. A lot of research, therefore, is devoted to the containment of this 
problem. 

An interesting idea for model checking of reachability properties that has 
received more attention recently is to “guide” the exploration of the (symbolic) 
state space such that “promising” sets of states are visited first. In a number 



of recent publications 



model checkers have 



been used to solve a number of non-trivial scheduling problems, reformulated in 
terms of reachability, viz. as the (im)possibility to reach a state that improves on 
a given optimality criterion. Such criteria distinguish scheduling algorithms from 
classical, full state space exploration model checking algorithms. They are used 



together with, for example, branch-and-bound techniques to prune parts 

of the search tree that are guaranteed not to contain optimal solutions. This 
observation motivates research into the extension of model checking algorithms 
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with optimality criteria. They provide a basis for the (cost-) guided exploration 
of state spaces, and improve the potential of model checking techniques for the 
resolution of scheduling problems. We believe that such extensions can be inter- 
esting for real-life applications of both model checking and scheduling. 

Based on similar observations an extension of the timed automata model 
with a notion of cost, th e Linearly Priced Timed Automata (LPTA), was already 
introduced in This model allows for a reachability analysis in terms of 

accumulated cost of traces, i.e. the sum of the costs of the individual transitions 
in the trace. Each action transitions has an associated price p determining its 
cost. Likewise, each loc ation has an associated rate r and the cost of delaying 
d time units is d ■ r. In and independently in computabitlity 

of minimal-cost reachability is demonstrated based on a cost-extension of the 
classical notion of regions. 

Although ensuring computability, the region construction is known to be very 
inefficient. Tools like Uppaal and Kronos use symbolic states of the form (/, Z), 
where I is a location of the timed automaton and .Z is a zone, i.e. a convex set 
of clock valuations. The central contribution of this paper is the extension of 
this concept to that of priced zones, which are attributed with an (affine) linear 
function of clock valuations that defines the cost of reaching a valuation in the 
zone. We show that the entire machinery for symbolic reachability in terms of 
zones can be lifted to cost-optimal reachability for priced zones. It turns out that 
some of the operations on priced zones force us to split them into parts with 
different price attributes, giving rise to a new notion, viz. that of the facets of a 
zone. 

The s uitability of the LPTA model for scheduling problems was already illus- 
trated in using the more restricted Uniformly Priced Timed Automata 

(UPTA) model, admitting an efficient priced zone implementation via Differ- 
ence Bound Matrices The model was used to consider traces for the 

time-optimal scheduling of a steel plant and a number of job shop problems. 
The greater expressivity of LPTA also supports other measures of cost, like idle 
time, weighted idle time, mean completion time, earliness, number of tardy jobs, 
tardiness, etc. We take an aircraft landing problem as the application 

example for this paper. 

The structure of the rest of this paper is as follows. In Section 2 we give an 
abstract account of symbolic optimal reachability in terms of priced transition 
systems, including a generic algorithm for optimal reachability. In Section 3 we 
introduce the model of linearly priced timed automata (LPTA) as a special case 
of the framework of Section 2. We also introduce here our running application 
example, the aircraft landing problem. Section 4 contains the definition of the 
central concept of priced zones. The operations that we need on priced zones 
and facets are provided in Section 5. The implementation of the algorithm, and 
the results of experimentation with our examples are reported in Section 6. Our 
conclusions, finally, are presented in Section 7. 
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2 Symbolic Optimal Reachability 



Analysis of infinite state systems require symbolic techniques in order to effec- 
tively represent and manipulate sets of states simultaneously (see 

For analysis of cost-optimality, additional information 
of costs associated with individual states needs to be represented. In this section, 
we describe a general framework for symbolic analysis of cost-optimal reachabil- 
ity on the abstract level of priced transition systems. 

A priced transition system is a structure T = (5,50, A, — >), where S' is a 
(infinite) set of states, sq G S is the initial state, A is a (finite) set of labels, 
and, — > is a partial function from S x A x S into the non-negative reals, M>o, 
defining the possible transitions of the systems as well as their associated costs. 
We write s -^p s' whenever — > (s, a, s') is defined and equals p. Intuitively, 
s -^p s' indicates that the system in state s has an a-labeled transition to the 
state s' with the cost of p. We denote by s s' that 3p G R>o.s -^p s'. 



and, by s 



Pi 



i' that 3a 
0.2 

Si >p2 S2 



G A. s — 1 s'. Now, an execution of T is a sequence 
— ^p„ s„. The cost of a, cost(a), is the sum 



a = So 

X]i6{i n}Pi- ^ given state s, the minimal cost of reaching s, mincost(s), 
is the infimum of the costs of finite executions starting in the initial state sq 
and ending in s. Similar, the minimal cost of reaching a designated set of states 
G C S, mincost(G), is the infimum of the costs of finite executions ending in a 
state of G. 



To compute minimum-cost reachability, we suggest the use of priced symbolic 
states of the form (A, tt), where A C S' is a set of states, and tt : A — > R>o 
assigns (non-negative) costs to all states of A. The intention is that, reachabil- 
ity of the priced symbolic state (A, tt) should ensure, that any state s of A is 
reachable with cost arbitrarily close to 7t(s). As we are interested in minimum- 
cost reachability, tt should preferably return as small cost values as possible. 
This is obtained by the following extension of the post-operators to priced sym- 
bolic states: for (A, tt) a priced symbolic state and a G A, Posta{A,Tr) is the 
priced symbolic state {posta{A),rj), where posta{A) = {s' | 3s G A. s s'} 
and rj is given by rj{s) = inf{7r(s') 3- p | s' G A A s' -^p s|. That is, rj essen- 
tially gives the cheapest cost for reaching states of B via states in A, assuming 
that these may be reached with costs according to tt. A symbolic execution of 
a priced transition system T is a sequence /3 = (Ao,7To),... ,(A„,7t„), where 
for i < n, (Ai+i,7Ti+i) = Postai{Ai,TTi) for some G A, and Aq = (soj and 
'^o{sq) = 0. It is not difficult to see, that there is a very close connection be- 
tween executions and symbolic executions: for any execution a of T ending in 
a state s, there is a symbolic execution [3 of T, that ends in a priced symbolic 
state (A, 7t), such that s G A and 7t(s) < cost(a). Dually, for any symbolic ex- 
ecution (3 oi P ending in priced symbolic state (A, tt), whenever s G A, then 
mincost(s) < 7t(s). From this it follows that the symbolic semantics on priced 
symbolic states accurately captures minimum-cost reachability in the sense that 
mincost(G) = inf{mincost(A n G, tt) : (A, tt) is reachable}. 
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Cost := oo 
Passed := 0 
Waiting := {({so}, ttq)} 
while Waiting 7 ^ 0 do 

select {A, tt) from Waiting 

if Al n G 7 ^ 0 and minCost{A n G, tt) < COST then 
Cost := minCost{A n G, tt) 
if for all {B,rj) in Passed: {B,rj) g {A,tt) then 
add {A, 7t) to Passed 
add Posta{A, tt) to Waiting for all a £ If 
return COST 

Fig. 1. Abstract Algorithm for the Minimal-Cost Reachability Problem. 



Let {A, 7t) and {B, 77) be priced symbolic states. We write (A, tt) CI {B, rf) if 
B C A and 7t(s) < rj{s) for all s G B, informally expressing, that (A, tt) is “as 
big and cheap” as {B,rj). Also, we denote by minCost{A,Tr) the infimum costs 
in A w.r.t. tt, i.e. inf{7r(s) | s G A}. Now using the above notion of priced sym- 
bolic state and associated operations, an abstract algorithm for computing the 
minimum cost of reaching a designated set of goal states G is shown in FigH It 
uses two data-structures Waiting and Passed to store priced symbolic states 
waiting to be examined, and priced symbolic states already explored, respec- 
tively. In each iteration, the algorithm proceeds by selecting a priced symbolic 
state (A, 7t) from Waiting, checking that none of the previously explored states 
{B,ri) are bigger and cheaper, i.e. {B,r]) % (A, tt), and adds it to Passed and 
its successors to Waiting. In addition, the algorithm uses the global variable 
Cost, which is initially set to 00 and updated whenever a goal state is found 
that can be reached with lower cost than the current value of Cost. The algo- 
rithm terminates when Waiting is empty, i.e. when no further priced symbolic 
states are left to be examined. When the algorithm of Fig. ^ terminates, the 
value of Cost equals mincost(G). Furthermore, termination of the algorithm 
will be guaranteed provided C is a well-quasi ordering on priced symbolic states. 

The above framework may be instantiated by providing concrete syntax 
for priced transition systems, together with data-structures for priced symbolic 
states allowing for computation of the Post-operations, minCost, as well as 
C (which should be well-quasi). In the following sections we provide such an 
instantiation for a priced extension of timed automata. 



3 Priced Timed Automata 



Linearly priced tim ed autom ata (LPTA) Q 
of timed automata Q 



m 



extend the model 

with prices on all edges and locations. In these mod- 
els, the cost of taking an edge is the price associated with it, and the price of a 
location gives the cost-rate applied when delaying in that location. 

Let C be a set of clocks. Then B(C) is the set of formulas that are conjunctions 
of atomic constraints of the form x tx n and x — y x m for a;,?/GC, [xi£{< 
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, =, >}| n a natural number, and m an integer. Elements of B{C) are called 
clock constraints or zones over C. V{C) denotes the power set of C. Clock values 
are represented as functions from C to the non-negative reals M>o, called clock 
valuations. We denote by the set of clock valuations for C. For u G and 
g G B(C), we denote by u G g that u satisfies all constraints of g. 

Definition 1 (Linearly Priced Timed Automata). A linearly priced timed 
automaton A over clocks C is a tuple {L, Iq, E, I, P), where L is a finite set of 
locations, Iq is the initial location, E C L x 8(C) x P(C) x L is the set of edges, 
where an edge contains a source, a guard, a set of clocks to be reset, and a target, 
I : L ^ B(C) assigns invariants to locations, and P : (LU E) ^ N assigns prices 
to both locations and edges. In the case of {I, g, r, I') G E, we write I I' . 




Fig. 2. Figure (a) depicts the cost of landing a plane at time t. Figure (b) shows an 
LPTA modelling the landing costs. Figure (c) shows an LPTA model of the runway. 



The semantics of a linearly priced timed automaton A = {L, Iq, E, I, P) may 
now be given as a priced transition system with state-space L x with the 
initial state (/q, uo) (where uq assigns zero to all clocks in C), and with the finite 
label-set E = E {5}. Thus, transitions are labelled either with the symbol 5 
(indicating some delay) or with an edge e (the one taken). More precisely, the 
priced transitions are given as follows: 

— {I, u) {l,u+ d) if VO < e < c? : u -h e G I{1), and p = d - P{1), 

— {I, u) -^p (I' , u') if e = {I, g, r, I') G E, u G g, u' = u[r i— > 0], and p = P(e), 

where for d G R>o, u + d maps each clock a; in C to the value u{x) + d, and 
u[r 1 -^ 0] denotes the clock valuation which maps each clock in r to the value 0 
and agrees with u over <C\r. 

Example 1 (Aircraft Landing Problem). As an example of the use of LPTAs 
we consider the problem of scheduling aircraft landings at an airport, due to 
For each aircraft there is a maximum speed and a most fuel efficient 
speed which determine an earliest and latest time the plane can land. In this 



^ For simplicity we do not deal with strict inequalities in this short version. 
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interval, there is a preferred landing time called target time at which the plane 
lands with minimal cost. The target time and the interval are shown as T and 
[E, L] respectively in Fig.^a). For each time unit the actual landing time devi- 
ates from the target time, the landing cost increases with rate e for early landings 
and rate I for late landings. In addition there is a fixed cost d associated with 
late landings. In Fig.Jb) the cost of landing an aircraft is modeled as an LPTA. 
The automaton starts in the initial location approaching and lands at the mo- 
ment one of the two transitions labeled landX | are taken. In case the plane 
lands too early it enters location early in which it delays exactly T — t time 
units. In case the plane is late the cost is measured in location late (i.e. the 
delay in location late is 0 if the plane is on target time). After L time units the 
automaton always ends in location done. FigureHc) models a runway ensuring 
that two consecutive landings takes place with a minimum separation time. □ 

4 Priced Zones 

Typically, reachability of a (priced) timed automaton, A = {L,Iq,E,I,P), is 
decided using symbolic states represented by pairs of the form {I, Z), where I is 
a location and Z is a, zone. Semantically, (/, Z) represents the set of all states 
(/, u), where u G Z. Whenever Z is a zone and r a set of clocks, we denote by Z'^ 
and {r}Z the set of clock valuations obtained by delaying and resetting (w.r.t. r) 
clock valuations from Z respectively. That is, = {u + d\u G Z,d & R>o} and 
{r}Z = {u[r I— f 0] \ u G Z}. It is well-known - using a canonical representation 
of zones as Difference Bounded Matrices (DBMs) - that in both cases the 

resulting set is again effectively representable as a zone. Using these operations 
together with the obvious fact, that zones are closed under conjunction, the post- 
operations may now be effectively realised using the zone-based representation 
of symbolic states as follows: 

- posts{{l, Z)) = (/, (Z A I{1))^ A I{1)), 

— poste{{l, Z)) = (/', {r}(Z A p)) whenever e= {l,g,r,l'). 

Now, the framework given in SectionHfor symbolic computation of minimum- 
cost reachability calls for an extension of our zone-based representation of sym- 
bolic states, which assigns costs to individual states. For this, we introduce the 
following notion of a priced zone, where the offset, Az, of a zone Z is the unique 
clock valuation of Z satisfying Vu G Z.Vx G C. Az{x) < u(x). 

Definition 2 (Priced Zone). A priced zone Z is a tuple (Z, c, r), where Z is 
a zone, c G N describes the cost of the offset, Az, of Z, and r : C — > Z assigns a 
cost-rate r(x) for any clock x. We write u G Z whenever u G Z. For any u G Z 
the cost of u in Z, Cost{u, Z), is defined as c-G J2xec ’ (^(^) ~ 

^ In the example we assume that several automata Ai,...,A„ can be composed 
in parallel with a CCS-like parallel composition operator to a network 

(Al, ..., An)\Act, with all actions Act being restricted. We further assume that the 
cost of delaying in the network is the sum of the cost of delaying in the individual 
automata. 
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Fig. 3. A Priced Zone and Successor-Sets. 



Thus, the cost assignments of a priced zone define a linear plane over the un- 
derlying zone and may alternatively be described by a linear expression over 
the clocks. Figure H illustrates the priced zone Z = {Z^c^r) over the clocks 
{a;, y}, where Z is given by the six constraints 2<x<7,2<y<6 and 
—2 < X — y < 3, the cost of the offset {Az = (2, 2) is c = 4, and the cost-rates 
are r{x) = —1 and r{y) = 2. Hence, the cost of the clock valuation (5.1, 2.3) is 
given by 4 -I- (—1) • (5.1 — 2) -|- 2 • (2.3 — 2) = 1.5. In general the costs assigned 
by Z may be described by the linear expression 2 — a; -I- 2y. 

Now, priced symbolic states are represented in the obvious way by pairs {I, Z), 
where I is a location and Z a priced zone. More precisely, {I, Z) represents the 
priced symbolic state (A, tt), where A = {{I, u)\uG Z} and n{l, u) = Cost(u, Z). 

Unfortunately, priced symbolic states are not directly closed under the Post- 
operations. To see this, consider a timed automata A with two locations I and 
m and a single edge from I to m with trivial guard (true) and resetting the clock 
y.. The cost-rate of Hs 3 and the transition has zero cost. Now, let Z = (Z, c, r) 
be the priced zone depicted in Fig.^and consider the associated priced symbolic 
state (/, Z). Assuming that the e-successor set, Poste{l, Z), was expressible as a 
single priced symbolic state (^', Z'), this would obviously require I' = m and Z' = 
{Z',c',r') with Z' = {y}Z. Furthermore, following our framework of Section^ 
the cost-assignment of Z' should be such that Cost(u', Z') = m/{Cost(u, Z)\u & 
Z A u\y 1 -^ 0] = u'} for all u' S Z' . Since r(y) > 0, it is obvious that these infima 
are obtained along the lower boundary of Z with respect to y (see Fig.^left). 
E.g. Cost((2, 0), Z') = 4, Cost((4, 0), Z') = 2, and Cost((6, 0), Z') = 2. In general 
Cost((a;, 0), Z') = Cost((a;, 2), Z) = & — xior2<x<7> and Cost((a;, 0), Z') = 
Cost((a;, a; — 3), Z) = a; — 4 for 5 < a; < 7. However, the disagreement w.r.t. the 
cost-rate of x (—1 or 1) makes it clear that the desired cost-assignment is not 
linear and hence not obtainable from any single priced zone. On the other hand, 
it is also shows that splitting Z' = {y}Z into the sub-zones Z( = Z'A2<a;<5 
and Z 2 = Z' A 5 < a; < 7, allows the e-successor set Poste{l, Z) to be expressed 
using the union of two priced zones (with r(a;) = —1 in Z( and r(a;) = 1 in Z' 2 ). 
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Fig. 4. A Zone: Facets and Operations. 



Similarly, priced symbolic states are not directly closed w.r.t. Posts- To see 
this, consider again the LPTA A from above and the priced zone Z = (Z,c,r) 
depicted in Fig.^ Clearly, the set Posts{l, Z) must cover the zone Z'' (see Fig.^. 
It can be seen that, although Posts{l,Z) is not expressible as a single priced 
symbolic state, it may be expressed as a Unite union by splitting the zone Z^ into 
the three sub-zones Z, z\ = {^Z'^\Z)t\(x—y < 1), and z\ = \Z)A(a;— y > 1). 

5 Facets and Operations on Priced Zones 

The universal key to expressing successor sets of priced symbolic states as finite 
unions is provided by the notion of facets of a zone Z. Formally, whenever a; ixi n 
{x — ytxim) is a constraint of Z, the strengthened zone Z A{x = n) {Z A{x — y = 
m)) is a facet of Z. Facets derived from lower bounds on individual clocks, x> n, 
are classified as lower facets, and we denote by LF{Z) the collection of all lower 
facets of Z. Similarly, the collection of upper facets, UF{Z), of a zone Z is 
derived from upper bounds of Z. We refer to lower as well as upper facets as 
individual clock facets. Facets derived from lower bounds of the forms x > n or 
X — y > m are classified as lower relative facets w.r.t. x. The collection of lower 
relative facets of Z w.r.t. x is denoted LFx{Z). The collection of upper relative 
facets of Z w.r.t. x, UFx{Z), is derived similarly. FigureHleft) illustrates a zone 
Z together with its six facets: e.g. {Zi, Zq} constitutes the lower facets of Z, and 
{Zi, Z 2 ] constitutes the lower relative facets of Z w.r.t. y. 

The importance of facets comes from the fact that they allow for decompo- 
sitions of the delay- and reset-operations on zones as follows: 

Lemma 1. Let Z be a zone and y a clock. Then the following holds: 

0 = Ufglf(z) ^^0 {y}^ = UFeLFy{Z){y}^ 

a) z'^ = z u Ufgc/f(z) {y}^ — UFeuFy{z)iy}^ 

Informally (see Fig. fright)) i) and ii) express that any valuation reachable by 
delay from Z is reachable from one of the lower facets of Z, as well as reachable 
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from one of the upper facets of Z or within Z. iii) (and iv)) expresses that any 
valuation in the projection of a zone will be in the projection of the lower (upper) 
facets of the zone relative to the relevant clock. 

As a first step, the delay- and reset-operation may be extended in a straight- 
forward manner to priced (relative) facets: 

Definition 3. Let Z = (F, c, r) be a priced zone, where F is a relative facet 
w.r.t. y in the sense that y — x = m is a constraint of F. Then {y}Z — {F' , c' , r'), 
where F' = {y}F , d = c, and r'{x) = r{y) + r(x) and r'(z) = r(z^or z ^ x. In 
case y = n is a constraint of F, {y}Z = {F', c, r) with F' = {y}F| 



Definition 4. Let Z = {F, c, r) be a priced zone, where F is a lower or upper 
facet in the sense that y = n is a constraint of F. Let p G N be a cost-rate. 
Then Z^p = {F' , d , d), where F' = F^ , d = c, and r'{y) = p — 
r'{z) = r{z) for z ^ y. 



Conjunction of constraints may be lifted from zones to priced zones simply by 
taking into account the possible change of the offset. Formally, let Z = (Z,c,r) 
be a priced zone and let g G B{C). Then Z Ag is the priced zone Z' = {Z' , d, d) 
with Z' = Z A g, d = r, and d = Qost{Az', Z). For Z = {Z, c, r) and n € N we 
denote hy Z n the priced zone {Z, c-\- n, r). 

The constructs of Definitions Handjessentially provide the Post-operations 
for priced facets. More precisely, it is easy to show that: 

Posdil, Zi) = {I', {y}{Zi Ag)+ P(e)) Posts{l, Z 2 ) = {I, {Z 2 A A /(/)) 



if e = {I, g, {y}, I'), Z\ is a priced relative facet w.r.t. to y and Z 2 is an individual 
clock facet. Now, the following extension of Lemmajto priced symbolic states 
provides the basis for the effective realisation of Post-operations in general: 

Theorem 1. Let A = {L,Iq, E, I, P) be an LPTA. Let e = {l,g,{y},l') G 
with P(e) = q, P{1) = p, I{1) = J and let Z = {Z, c, r) be a priced zone. Then: 



Poste{{l,Z)) 



{ {y}Q + 9 ) I Q e LFy{Z Ag)} if r{y) > 0 

{ {y}Q + 9 ) I Q GlJFy{Z A g)} if r{y) < 0 



Posts{{l,Z)) 



{{1,Z)}u{{1,Q^paJ)\Qg UF{Z a J)} ifp > E,6c 
{ {I, Q^P A J)\Qg LF{Z a J)} ifp < E,ec 



® This “definition” of {y}(Z) is somewhat ambigious since it depends on which con- 
straint involving y that is choosen. However, the Cost-function determined will be 
independent of this choice. 

^ For the case with a general reset-set r, the notion of relative facets may be generalized 
to sets of clocks. 
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In the definition of Paste the successor set is described as a union of either lower 
or upper relative facets w.r.t. to the clock y being reset, depending on the rate 
of y (as this will determine whether the minimum is obtained at the lower of 
upper boundary). For similar reason, in the definition of Posts, the successor- 
set is expressed as a union over either lower or upper (individual clock) facets 
depending on the rate of the location compared to the sum of clock cost-rates. 

To complete the instantiation of the framework of Section Q it remains 
to indicate how to compute minCost and C on priced symbolic states. Let 
Z = {Z,c,r) and Z' = {Z' ,c',r') be priced zones and let {l,Z) and {l',Z') be 
corresponding priced symbolic states. Then minCost(l, Z) is obtained by min- 
imizing the linear expression c -I- ~ ^z(x)) under the (linear) 

constraints expressed by Z. Thus, computing minCost reduces to solving a sim- 
ple Linear Programming problem. Now let Z'\Z be the priced zone {Z* , c* , r*) 
with Z* = Z, c* = c' — Cost{Az', Z) and r*{x) = r'{x) — r(x') for all a; S C. It 
is easy to see that Cost(u, Z'\Z) = Cost(u, Z') — Cost(u, Z) for all u € Z' , and 
hence that {l,Z) C {l',Z') iff I = I', Z' C Z and minCost{Z'\Z) > 0) Thus, 
deciding C also reduces to a Linear Programming problem. 

In exploring LPTAs using the algorithm of Fig. J we will only need to 
consider priced zones Z with non-negative cost assignments in the sense that 
Cost(u, Z) > 0 for all u G Z. Now, application of Higman’s Lemma | 
ensures that Cl is a well-quas i ordering on priced symbolic states for bounded 
LPTA. We refer to for more detailed arguments. 
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6 Implementation and Experiments 

In this section we give further details on a prototype implementation within the 
tool Uppaal of priced zones, formally defined in the previous sections, 

and report on experiments on the aircraft landing problem. 

The prototype implements the Paste (reset). Posts (delay), mi nCost, and 
□ operations, using extensions of the DBM algorithms outlined in To 

minimize the number of facets considered and reduce the size of the LP problems 
needed to be solved, we make heavy use of the canonical representation of zones 
in terms a minimal set of constraints given in For dealing with LP 

problems, our prototype currently uses a free available implementation of the 
simplex algorithm! Man y of the techniques for pruning and guiding the state 
space search described in have been used extensively in modelling and 

verification. 

Recall the aircraft landing problem partially described in Example H 
LPTA model of the costs associated with landing a single aircraft is shown in 
Fig.^b)- When landing several planes the schedule has to take into account 
the separation times between planes to avoid that the turbulence of one plane 
affecting an other. The separation times depend on the types of the planes that 
are involved. Large aircrafts for example generate more turbulence than small 

® lp_solve 3.1a by Michael Berkelaar, :ld : / /ild . es .ere .tue .n±/DUD/ xd soxv< 
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Table 1. Results for seven instances of the aircraft landing problem. Results 
were obtained on a Pentiumll 333Mhz. 





problem instance 


1 


2 


3 


4 


5 


6 


7 




number of planes 


10 


15 


20 


20 


20 


30 


44 




number of types 


2 


2 


2 


2 


2 


4 


2 




optimal value 


700 


1480 


820 


2520 


3100 


24442 


1550 


1 


explored states 


481 


2149 


920 


5693 


15069 


122 


662 




eputime (secs) 


4.19 


25.30 


11.05 


87.67 


220.22 


0.60 


4.27 




optimal value 


90 


210 


60 


640 


650 


554 


0 


2 


explored states 


1218 


1797 


669 


28821 


47993 


9035 


92 




eputime (secs) 


17.87 


39.92 


11.02 


755.84 


1085.08 


123.72 


1.06 




optimal value 


0 


0 


0 


130 


170 


0 




3 


explored states 


24 


46 


84 


207715 


189602 


62 


N/A 




eputime (secs) 


0.36 


0.70 


1.71 


14786.19 


12461.47 


0.68 






optimal value 








0 


0 






4 


explored states 


N/A 


N/A 


N/A 


65 


64 


N/A 


N/A 




eputime (secs) 








1.97 


1.53 







ones, and successive planes should consequently keep a bigger distance. To model 
the separation times between two types of planes we introduce an LPTA of the 
kind shown in Fig. He) • 

TableHpresents the results of an experiment were the prqtotypewas applied 
to seven instances of the aircraft landing problem taken from For each 

instance, which varies in the number of planes and plane types, we compute the 
cost of the optimal schedule. In cases the cost is non-zero we increase the number 
of runways until a schedule of cost 0 is founej In all instances, the state space is 
explored in minimal cost-order, i.e. we select from the waiting list the priced zone 
(I, Z) with lowest minCost{l, Z). Equal values are distinguished by selecting first 
the zone which results from the largest number of transitions, and secondly by 
selecting the zone which involves the plane with the shortest target time. As 
can be seen from the table, our current prototype implementation is able to deal 
with all the tested instances. Beasley et al. solves all problem instances 

with a linear programming based tree search algorithm, in cases that the initial 
solution - obtained with a heuristic - is not zero. In 7 of the 15 benchmarks 
(with optimal solution greater than zero) the time-performance of our method is 
better than theirs. These are the instances 4 to 7 with less than 3 runways. This 
result also holds if we take into account that our computer is about 50% faster 
(according to the Dongarra Linpack benchmarks It should be noted, 

however, that our solution-times are quite incomparable to those of Beasleys. 
For some instances our approach is up to 25 times slower, while for others it is 
up to 50 times faster than the approach in 



° These and other benchmarks are available at [vd : / /mscmea.ms . ic . ac .uk/dud. 

^ This is always possible as the cost of landing on target time is 0 and the number of 
runways can be increased until all planes arrive at target time. 




504 



Kim Larsen et al. 



The cost-extended version of Uppaal has additionally been (and is currently 
being) applied to other examples, including a cost-extended version of the Bridge 
Problem an optimal broadcast problem and a testing problem. 

7 Conclusion 

In this paper we have considered the minimum-cost reachability problem for LP- 
TAs. The notions of priced zones, and facets of a zone are central contributions of 
the paper underlying our extension of the tool Uppaal. Our initial experimental 
investigations based on a number of examples are quite encouraging. 

C ompared with the existing special-purpose, time-optimizing version of Up- 
paal the presented general cost-minimizing implementation does only 

marginally down-grade performance. In particular, the theoretical possibility of 
uncontrolled splitting of zones does not occur in practice. In addition, the consid- 
eration of non-uniform cost seems to significantly reduce the number of symbolic 
states explored. 

The single, most important question, which calls for future research, is how 
to exploit the simple structure of the LP-problems considered. We may benefit 
significantly from replacing the currently used LP package with some package 
more tailored towards small-size problems. 



References 



AC91. D. Applegate and W. Cook. A Computational Study of the Job-Shop 
Scheduling Problem. OSRA Journal on Computing 3, pages 149-156, 1991. 

ACJYK96. P. Abdulla, K. Cerans, B. Jonsson, and T. Yih-Kuen. General decidability 
theorems for infinite-state systems, 1996. 

AD90. R. Alur and D. Dill. Automata for Modelling Real-Time Systems. In Proc. 

of Int. Colloquium on Algorithms, Languages and Programming, number 
443 in Lecture Notes in Computer Science, pages 322-335, July 1990. 

AJ94. P. Abdulla and B. Jonsson. Undecidability of verifying programs with 
unreliable channels. In Proc. 21st Int. Coll. Automata, Languages, and 
Programming (ICALP’94), volume 820 of LNCS, 1994. 

ATP. R. Alun, S. La Torre, and G. J. Pappas. Optimal paths in weighted timed 

automata. To appear in HSCC2001. 

BDM'*'98. M. Bozga, C. Daws, O. Maler, A. Olivero, S. Tripakis, and S. Yovine. 

Kronos: A Model-Checking Tool for Real-Time Systems. In Proc. of the 
10th Int. Conf. on Computer Aided Verification, number 1427 in Lecture 
Notes in Computer Science, pages 546-550. Springer-Verlag, 1998. 

BFH''". G. Behrmann, A. Fehnker, T. Hune, K.G. Larsen, P. Pettersson, and 
J. Romijn. Efficient guiding towards cost-optimality in uppaal. To appear 
in Proceedings of TACAS’2001. 

BFH'''01. G. Behrmann, A. Fehnker, T. Hune, K. G. Larsen, P. Pettersson, J. Romijn, 
and F. Vaandrager. Minimum- Cost Reachability for Priced Timed Au- 
tomata. To appear in Proceedings of HSCC2001, 2001. 

BKAOO. J.E. Beasley, M. Krishnamoorthy, and D. Abramson. Scheduling Aircraft 
Landings-The Static Case. Transportation Science, 34(2): 180-197, 2000. 



As Cheap as Possible 505 



BMOO. 

Cer94. 

Dil89. 

DonOl. 

Feh99. 

FS98. 

FSOl. 

HHWT97. 

Hig52. 

HLPOO. 

LLPY97. 

LPY97. 

Mil89. 

NY99. 

RB98. 

Rok93. 



Ed Brinksma and Angelika Mader. Verification and optimization of a pic 
control schedule. In Proceedings of the 7th SPIN Workshop, volume 1885 
of Lecture Notes in Computer Science. Springer Verlag, 2000. 

K. Cerans. Deciding properties of integral relational automata. In Pro- 
ceedings of ICALP 94 , volume 820 of LNCS, 1994. 

D. Dill. Timing Assumptions and Verification of Finite-State Concurrent 
Systems. In J. Sifakis, editor, Proc. of Automatic Verification Methods for 
Finite State Systems, number 407 in Lecture Notes in Computer Science, 
pages 197-212. Springer- Verlag, 1989. 

Jack J. Dongarra. Performance of Various Computers Using 
Standard Linear Equations Software. Technical Report CS- 
89-85, Computer Science Department, University of Tennessee, 
2001. An up-to-date version of this report can be found at 
ivvp : / / WWW . nev J.X D . org/ DencnmarK/ pen ormance . ps 

A. Eehnker. Scheduling a steel plant with timed automata. In Proeeedings 
of the 6th International Conference on Real- Time Computing Systems and 
Applications (RTCSA99), pages 280-286. IEEE Computer Society, 1999. 
A. Finkel and P. Schnoebelen. Fundamental structures in well-structured 
infinite transition systems. In Proc. 3rd Latin American Theoretical Infor- 
matics Symposium (LATIN’98), volume 1380 of LNCS, 1998. 

A. Finkel and Ph. Schnoebelen. Well structured transition systems every- 
where. Theoretical Computer Science, 256(l-2):64-92, 2001. 

T.A. Henzinger, P.-H. Ho, and H. Wong-Toi. HyTech: A Model Checker 
for Hybird Systems. In Orna Grumberg, editor, Proc. of the 9th Int. Conf. 
on Computer Aided Verification, number 1254 in Lecture Notes in Com- 
puter Science, pages 460-463. Springer- Verlag, 1997. 

G. Higman. Ordering by divisibility in abstract algebras. Proc. of the 
London Math. Soc., 2:326-336, 1952. 

T. Hune, K.G. Larsen, and P. Pettersson. Guided Synthesis of Control 
Programs Using Uppaal. In Ten H. Lai, editor, Proe. of the IEEE ICDCS 
International Workshop on Distributed Systems Verification and Valida- 
tion, pages E15-E22. IEEE Computer Society Press, April 2000. 

Fredrik Larsson, Kim G. Larsen, Paul Pettersson, and Wang Yi. Effi- 
cient Verification of Real-Time Systems: Compact Data Structures and 
State-Space Reduction. In Proc. of the 18th IEEE Real-Time Systems 
Symposium, pages 14-24. IEEE Computer Society Press, December 1997. 
K.G. Larsen, P. Pettersson, and W. Yi. Uppaal in a Nutshell. Int. Journal 
on Software Tools for Technology Transfer, 1(1-2):134-152, October 1997. 
R. Milner. Communication and Concurrency. Prentice Hall, Englewood 
Cliffs, 1989. 

P. Niebert and S. Yovine. Computing optimal operation schemes for multi 
batch operation of chemical plants. VHS deliverable. May 1999. Draft. 
T.C. Ruys and E. Brinksma. Experience with Literate Programming in 
the Modelling and Validation of Systems. In Bernhard Steffen, editor. Pro- 
ceedings of the Fourth International Conference on Tools and Algorithms 
for the Construction and Analysis of Systems (TACAS’98), number 1384 
in Lecture Notes in Computer Science (LNCS), pages 393-408, Lisbon, 
Portugal, April 1998. Springer- Verlag, Berlin. 

T.C. Rokicki. Representing and Modeling Digital Cireuits. PhD thesis, 
Stanford University, 1993. 



Binary Reachability Analysis of 
Pushdown Timed Automata with Dense Clocks 



Zhe Dang 

School of Electrical Engineering and Computer Science 
Washington State University, Pullman, WA 99164, USA 
zdangOeecs . wsu . edu 



Abstract. We consider pushdown timed automata (PTAs) that are timed au- 
tomata (with dense clocks) augmented with a pushdown stack. A configuration of 
a PTA includes a control state, dense clock values and a stack word. By using the 
pattern technique, we give a decidable characterization of the binary reachability 
(i.e., the set of all pairs of configurations such that one can reach the other) of 
a PTA. Since a timed automaton can be treated as a PTA without the pushdown 
stack, we can show that the binary reachability of a timed automaton is definable 
in the additive theory of reals and integers. The results can be used to verify a 
class of properties containing linear relations over both dense variables and un- 
bounded discrete variables. The properties previously could not be verified using 
the classic region technique nor expressed by timed temporal logics for timed 
automata and CTL* for pushdown systems. 



1 Introduction 



A timed automaton H can be considered as a finite automaton augmented with a num- 
ber of dense (either real or rational) clocks. Due to their ability to model and analyze 
a wide range of real-time systems, timed automata have been extensively studied in 
recent years (see for recent surveys). In particular, by using the standard region 

technique, it has been shown that region reachability for timed automata is decidable 
m- This fundamental result and the technique help researchers, both theoretically and 
practically, in formulating various timed temporal logics 12141516122125 12btf71 and de- 
veloping verification tools 

Region reachability is useful but has intrinsic limitations. In many real-world ap- 
plications ill II . we might also want to know whether a timed automaton satisfies a 
non-region (e.g., Presburger) property. Recently, Comon and Jurski G1 have shown 
that the binary reachability of a timed automaton is definable in the additive theory of 
reals, by flattening a timed automaton into a real-valued counter machine without nested 
cycles lUD. The result immediately paves the way for automatic verification of a class 
of non-region properties that previously were not possible using the region technique. 

In this paper, inspired by Comon and Jurski’s result [H, we consider pushdown 
timed automata (PTAs) that are obtained by augmenting timed automata with a push- 
down stack. The main result in this paper gives a decidable binary reachability charac- 
terization for PTAs such that a class of non-region properties can be verified. A possible 
way to show this result to look at the flattening technique of Comon and Jurski’s to see 



G. Berry, H. Comon, and A. Finkel (Eds.): CAV 2001, LNCS 2102, pp. .iOfi JTTTl 2001. 
© Springer- Verlag Berlin Heidelberg 2001 
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whether the technique can be adapted by adding a pushdown stack. However, this ap- 
proach has an inherent difficulty; the flattening technique, as pointed out in their paper, 
destroys the structure of the original timed automaton, and thus, the sequences of stack 
operations can not be maintained after flattening. 

In this paper, we introduce a new technique, called the pattern technique, by sep- 
arating a dense clock into an integral part and a fractional part. For a pair (vq,Vi) 
of two tuples of clock values, we define an ordering, called the pattern of (uo,tti), 
on the fractional parts of Vg and Vi. An equivalent relation is defined such that 
(vq, Ui)«(uq, v{) iff Vg and Vg (vi and v[ will also) have the same integral parts, and 
both (vg, Vi) and (ug, v'l) have the same pattern. preserves the binary reachability: 
Vg can reach Vi by a sequence of transitions iff Vg can reach v\ by the (almost) same 
sequence of transitions. Therefore, by preserving the (almost) same control structure, 
a PTA can be transformed into a discrete transition system (called the pattern graph) 
containing discrete clocks (for the integral parts of the dense clocks) and a finite vari- 
able over patterns. The pattern graph can be further reduced to a discrete PTA, whose 
binary reachability is decidable and can be accepted by a nondeterministic pushdown 
automaton augmented with reversal-bounded counters (NPCM) mi- By translating a 
pattern back to a relation over the fractional parts of the clocks, the decidable binary 
reachability characterization (namely, (D -f NPCM)-definable) for PTAs can be de- 
rived. Given this characterization, it can be shown that the particular class of safety 
properties that contain mixed linear relations over both dense variables (e.g., clock val- 
ues) and discrete variables (e.g., word counts) can be automatically verified for PTAs. 
In this extended abstract, all the proofs are omitted. For a complete exposition see M- 

2 Preliminaries 

A nondeterministic multicounter machine is a nondeterministic machine with a finite 
number of states, a one-way input tape, and a finite number of integer counters. Each 
counter can be incremented by 1, decremented by 1, or stay unchanged. Besides, a 
counter can be tested against 0. A reversal-bounded nondeterministic multicounter 
machine (NCM) is a nondeterministic multicounter machine in which each counter is 
reversal-bounded (i.e., it changes mode between nondecreasing and nonincreasing for 
some bounded number of times). A reversal-bounded nondeterministic pushdown mul- 
ticounter machine (NPCM) is an NCM augmented with a pushdown stack. It is known 
that the emptiness problem for NPCMs (and hence NCMs) is decidable 01. 

Let N be integers, D = Q (rationals) or R (reals), T be an alphabet. We use N+ 
and D+ to denote non-negative values in N and D, respectively. Each value G D 
can be uniquely expressed as the sum of [?;] -|- [tij, where [n] G N is the integral 
part of V, and 0 < [nj < 1 is the fractional part of v. Given m > 1. Let Xi, yi, 
and Wi be a dense variable over D, an integer variable over N, and a word variable 
r* , for each 1 < i < m, respectively. We use ^a{wi) to denote a count variable 
representing the number of symbol a G F mwi. A linear term t is defined as follows; 
t ::= n \ Xi \ yi \ #a(wi) \t — t\ t -\-t, where n G a G F. A mixed linear 
relation I is defined as follows: I ::= t > 0 \ t = 0 \ tdiscr mod n = 0 | M \ I A I, 
where 0 ^ n G N and fdiscr- is a linear term not containing dense variables. A dense 
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linear relation is a linear relation that contains dense variables only. A discrete linear 
relation is a linear relation that does not contain dense variables. 

A tuple of integers and words can be encoded as a string by concatenating the unary 
representations of each integer and each of the words, with a separator ^ ^ F. The do- 
main of FI, a predicate over integer variables and word variables, is the set of tuples of 
integers and words that satisfy H. F[ is an NPCM predicate (or simply NPCM) if there 
is an NPCM accepting the domain (encoded as a set of strings, i.e., a language) of H. A 
(D + NPCM);/brmMia / is defined as follows: / ::= ldense/\H \ Idense^ H \ /V/, 
where Idense is a dense linear relation and H is an NPCM predicate. Given p,q,r > 0. 
A predicate A on tuples in x x (C*)’’ is (D -f 'NPC'M.)-definable if there 
is a (D -t- NPCM) -formula / with p dense variables, p + q integer variables, and r 
word variables, such that, for all xi, • • • , G D, yi, • • • , G N, and w\, - ■ ■ ,Wr G 
F*, {xi,--- ,Xp,yi,--- ,Wr) G A iff f{[xi\, ■ ■ ■ , [Xp\, \xi^, ■ ■ ■ , \xp'], 
yi, • • • , yq,wi, ■■■ ,Wr) holds. 

LcHlIUtl 1. ( 1 )• Both Idiscrete ^ ^ and Idiscrete V ^ are NPClfl predicates, if Idiscrete 

is a discrete linear relation and F[ is an NPCM predicate. (2). NPCM predicates are 
closed under existential quantifications (over integer variables and word variables). 
(3). If A is (D + fAPCfs/Vj-definable and I is a mixed linear relation, then both I A A 
and I V A are (DH-NPCM) -definable. (4). The emptiness (satisfiability) problem for 
(D + NPCM) -definable predicates is decidable. 



3 Clock Patterns and Their Changes 

A dense clock is simply a dense variable on D+. Fix ak > 0 and consider k-\-l clocks 
X = xq, ■ ■ ■ , Xk- For technical reasons, xq is an auxiliary clock indicating the current 
time now. Denote K = {0, ■ ■ ■ ,k} and AT+ = {1, • • • , k}. A subset K' of K is abused 
as a set of clocks; i.e., we say xt G K' if i G K' . A (clock) valuation is a function 
K — > D+ that assigns a value in D+ to each clock in K. A discrete (clock) valuation 
M is a function K N+ that assigns a value in N+ to each clock in K. For each 
valuation v and <5 G D+, , [t>J and v-\-6 are valuations satisfying |"t>] (i) = , 

[t>J (i) = and (v 6){i) = v(i) 6 for each i G K. The relative representation 

9 of a valuations is a valuation satisfying: (1). [s] = \v~\,(2). [sj (0) = [1 — [sj(0)j, 
(3). [sj (i) = [[sj (i) [sj (0)J , for each i G AT"*". A valuation Vq is initial if clock 
Xq has value 0, i.e., So(0) = 0. 

We distinguish two disjoint sets, = {0°, • • • , k^} and = {0^, • • • , fc^}, of 
indices . A pattern pis, a sequence po, - ■ ■ ,pn, for some 0 < n < 2(/c-|-l),of nonempty 
and disjoint subsets of U such that 0° G po and Uo<i<nPi = U K^. pi 
is called the i-position. A pair of valuations (vg, Vi) is initialized if Vq is initial. An 
initialized pair (vg, Vi) has pattern p = pg, ■ ■ ■ ,pn, written (vg, Si) G p, if, for each 
0 < m,m' < n, each b, b' G {0, 1}, and each i, i' G K, G Pm and G Pm' imply 
that 

[sfej (i) = [ivj (i) (resp. <) iff m = m (resp. m < m). 

<P denotes the set of all the patterns (|<?| < ). The now-position of p is pi, 

for some i, with 0^ G Pi. A pattern is regulated if the now-position of p is pg. A pattern 
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is initial if it is the pattern of {vq, rtp) for some initial valuation Vq. If rj is the pattern 
of {vq, Di), we use init{r]) to denote the pattern of {vq, vq). init{r]) is unique for each 
77. A pattern is a merge-pattern if the now-position is a singleton set (i.e., 0 ^ is the only 
element). A pattern is a split-pattern if it is not a merge-pattern, i.e., the now-position 
contains more than one element. A valuation Vi has pattern 77 if 77 is the pattern of 
(wojt’i) for some Vq. A pattern of Vi tells the fractional orderings between [r;ij(7) 
and [vij (j) and between [r;ij (i) and 0 , for all i,j G . Given two initialized pairs 
(vgjtii) and (vl,V2), we write Vi)«(vo, 772), if and (t 7 o,« 2 ) have the 

same pattern, and have the same integral parts (i.e., [77J] = |"t 7 q], |"t 7 i] = [772]). 

Example 1 . Let 779 = (Ogo, 5 . 5 io, 2.820) and 77 1 = ( 1 . 6 gi, 2 . 9 ii, 3 . 121), where sub- 
scripts are indices. Note that 77g = (Ogo, 5 . 5 io, 2.820) and tti = ( 1 . 4 gi, 2 . 3 ii, 3.52i). 
The pattern 77 of (77g, t7i) can be drawn by collecting the fractional parts in Vg and tti 
from small to large while writing down the indices; i.e., {0*^}, {2°, 1^}, {0^}, {1*^, 2^}. 
77 is a merge-pattern. Take 772 = t7i -|- .1 and compute ttJ = ( 1 . 3 gi, 3 . 3 ii, 3.52i). 
Observe that the fractional parts (except for the first component) are the same in ttJ 
and 77i. The pattern 77' of (77g, 772) can be drawn similarly: { 0 °}, { 2 °, 1 ^, 0 ^}, { 1 °, 2 ^}, 
which is the result of merging 0 ^ to its previous position in 77. 77' is a split-pattern. Take 
773 = 772 -I- - 05 . We can verify the pattern of (77g, 773 ) is { 0 °}, { 0 ^}, { 2 °, 1 ^}, { 1 °, 2 ^}, 
which is the result of splitting 0 ^ from the now-position of 77'. This procedure can go 
on while incrementing 773 : merge 0^ to the 0-position {0°}, and then split 0^ from it 
(by appending { 0 ^} at the end), and so on. Eventually, the pattern will repeat when 0 ^ 
returns to the original position in 77 (e.g., after a total increment of 1 from t7i). I 

For each 0 < i 5 € D+, 77 -|- 5 is the result of a clock progress from 77 by an amount 
of S. Function next : x (N+)^+^ — > x (N+)^+^ describes how a pattern changes 

after a clock progress. Given any discrete valuation u and pattern rj = pg, ■ ■ ■ ,pn with 
the now-position being pi for some i, next{r], u) is defined to be {rj' , u') such that, 

- (the case when 77 is a merge-pattern) if f > 0 and \pi\ = 1 (that is, pi = {0^}), 
then 77' is pg, - ■ ■ ,Pi-i U { 0 ^},pi+i, ■ ■ ■ iPn (that is, 77' is the result of merging the 
now-position to the previous position), and for each j G , if y ^ G Pi-i, then 
u'{j) — u{j) 1 else u'(j) — u{j). Besides, if z = 1 (i.e., the now-position is 
merged to pg; in this case, 77' is a regulated pattern), then 77,' ( 0 ) = m( 0) -f 1 else 
m'( 0 ) = 7 i( 0 ), 

- (the case when 77 is a split pattern) if 7 > 0 and \pi\ > 1, then 77' is the result of 
splitting 0 ^ from the now-position. That is, if 7 > 0 , 77' is pg, ■ ■ ■ ,Pi-i, { 0 ^},pi — 
{ 0 ^},Pj+i, • • • ,pn- However, if f = 0 , 77' is pg - {O’^j.pi, • • • ,p„,{ 0 ^}. In either 
case, u' = u. 

If next{r], u) = (77', u'), rj' is called the next pattern oft], written Next{t]). 

According to ExampleGl we visualize a pattern 77 as a circle. Applications of Next 
can be regarded as moving 0^ along the circle, by performing merge-operations and 
split-operations alternatively. After enough number of applications of Next, 0 ^ will re- 
turn to the original now-position after moving through the entire circle. That is, for each 
pattern 77, Next^{rj) = 77, where m = 2 n (resp. m = 2 {n 1)) if 77 is a merge-pattern 

(resp. split-pattern). The sequence 77, Next{rj), ■ ■ ■ , Next‘S {rj) is called a pattern ring. 
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Notice that next^{ri, u) = (77, m + 1 ) for each u. On a pattern ring, merge-patterns and 
split-patterns appear alternately. 

Beside clock progresses, clock resets are the other form of clock behaviors. 

Example 2 . Take Vq and Vi as in ExampleQ] Consider v[ = (I.Oqi , Oii , 3.I21 ) that is 
the result of resetting x\ in Vi. The pattern of {vq, v'l) is { 0 °}, { 2 °}, { 0 ^, 1 ^}, { 1 °, 2 ^}, 
which is the result of moving (the index of x\ in rti) into the now-position {0^} of 
the pattern 77 of (vq, rti) (see ExampleGJ)- i 

Let r C iT+ be (a set of) clock resets. Denote v to be the result of resetting each 
clock Xi G r (i.e., i G r). That is, for each i G K, if i G r, then {v |r )(0 = 0 
{v ir)(*) = v(i). Functions resetr '■ x ^ x for r C 

describe how a pattern changes after clock resets. Given any discrete valuation u and 
any pattern rj = po, ■ ■ ■ ,pn with the now-position being pi for some i, resetr{rj, u) is 
defined to be (77', u') such that, 

- 77' is po - • iPi-i - U r^,Pi+i - - r\ where = {j^ : 

j G r} C K^. Therefore, 77' is the result of bringing every index in into the 
now-position. Notice that some of positions pm — niay be empty after moving 
indices in out of pm, for m ^ i.ln this case, these positions are removed from 
77' (to guarantee that 77' is well defined.), 

- for each j G K, if j G r, then u'{j) = 0 else u'{j) = u{j). 

If resetrip, u) = (77', u'), 77' is written as Resetr{p). 

Given an initialized pair {vg, v) and 0 < 5 G D+. Assume the patterns of (vq, v) 
and (vq, V + S) are 77 and 77', respectively. We say v has no pattern change for S if, for 
all 0 < < 5 , (vq,v -f S') has the same pattern. We say v has one pattern change 

for 5 if Next{p) = 77' (recall Next{rf) 77) and, for all 0 < J' < < 5 , (wq, v -f i 5 ') has 
pattern 77, or, for all 0 < 5 ' < 5 , (t>o, v -f S') has pattern 77'. We say v has n pattern 
changes for S with 77 > 1, if there are positive Si, - ■ ■ ,Sn in D+ with Si<i<nSi = S 
such that V + Si<i<jSi has one pattern change for Sj+i, for each j — 0 , ■ ■ ■ ,n — 1 . 
The following lemma states that both next and resetr are “correct”. 

Lemma 2 . For all patterns 77 and p', for all r C AT+, and for all discrete valuations u 
and u' , the following ( 1 ) and ( 2 ) hold: 

( 1 ) . (correctness of next) next(j],u) = (j]',u') iff there exist an initialized pear 
(vq, v) and 0 < S G D+ such that 

( 7 . 7 ). 77 is the pattern of(vo, v) and p' is the pattern of{vQ, v -f i 5 ), 

( 7 . 2 ). u — [7;] and u' = [d -f 5 ], 

( 7 . 5 ). V has one pattern change for S. In particular, if p is a merge-pattern, then for 
all 0 < S' < S, p is the pattern of{vn, v -f S'). If, however, 77 is a split-pattern, then for 
all 0 < S' < S, p' is the pattern of{vo, v -\- S'), 

( 2 ) . (correctness of resetr) resetr{p,u) = [p^fJ-') iff there exist an initialized pair 
(vq, v) such that 

( 2 . 1 ) . 77 is the pattern of(vo, v) and p' is the pattern of{vQ, v 

( 2 . 2 ) . M = [7;] andu' = [t; 1 ^ 1 - 
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(3) . For any fixed initialized pair (vg, v) and fixed 0 < i5 C D+, there is a unique 
finite number n such that v has n pattern changes for S. In particular, when 5 = 1, the 
number n is exactly the length of the pattern ring starting from the pattern of(vg, v). 

(4) . The number n in (3) can be uniformly bounded for each 5. That is, for any fixed 
5 e D+, there is a finite number m such that, for any initialized pair (vg, v), v has at 
most m pattern changes for 5. 

(5) . For any fixed initialized pair {vg, v), the pattern of {vg, v) is a merge-pattern 
iff there is a 0 < 5 G such that v has no pattern change for 5. 

4 Clock Constraints and Patterns 

An atomic clock constraint (over clocks xi, - ■ ■ , Xk, excluding xg) is a formula in the 
form of Xi — Xjfid or Xifid where Q < d G N+ and stands for <, >,<,>,=. A 
clock constraint c is a Boolean combination of atomic clock constraints. Denote C to he 
the set of all clock constraint (over clocks Xi, - ■ ■ ,Xk)- We say v G cif clock valuation 
V (for Xg, ■ ■ ■ ,Xk) satisfies clock constraint c. 

Any clock constraint c can be written as a Boolean combination /(c) of clock con- 
straints over discrete clocks [xi], • • • , \xk'\ and fractional orderings \xi\fi\xj\ and 
\xi\fiQ. Therefore, testing r; S c is equivalent to testing [r;] and the fractional order- 
ings on [vj satisfying /(c). 

Assume v has pattern rj. We use to denote the result of replacing fractional or- 
derings in I (c) by the truth values given by rj. is a clock constraint (over discrete 
clocks). The following lemma can be observed. 

Lemma 3. (1). For any initialized pair (vg, v), any pattern rj G d>, if (vg, v) has pat- 
tern rj, then, for any clock constraint c G C, v G c iff \v"\ G c^. (2). For any initialized 
pair (vg, v) and any 0 < <5 G ifv has at most one pattern change for 5, then, for 
any clock constraint c G C, WO < S' < 6{v S' G c) ijf v G c and -I- i5 G c. (3). For 
any initialized pairs (vg,Vi) and (vg,V 2 ), if {vg,Vi)za(vg,V 2 ), then, for any c G C, 
Vi G c iffv 2 G c. 

Consider two initialized pairs (t>J,t)i) and {v^,V 2 ) such that (uj, t>i)«(t>Q, t> 2 ). 
^From Lemma P|31. any test c G C will not tell the difference between Vi and V 2 - 
Assume Vi can be reached from a valuation via a clock progress by an amount of 
<5i, i.e., -I- (5i = Vi. We would like to know whether V 2 can be reached from some 

valuation v'^ also via a clock progress but probably by a slightly different amount of 52 
such that {vg, u^) and (ug, u^) are still equivalent(R:!). We also expect that for any test 
c, if during the progress of c is consistently satisfied, then so is c for the progress 
of v'^. The following lemma concludes that these, as well as the parallel case for clock 
resets, can be done. This result can be used later to show that if is reached from Vg 
by a sequence of transitions that repeatedly perform clock progresses and clock resets, 
then V 2 can be also reached from Vg via a very similar sequence such that no test c can 
tell the difference on the two sequences. 

Lemma 4. For any initialized pairs {vg, Di) and {vg, V 2 ) with {vg, t>i)«(t>g, V 2 ), 

(1). for any 0 < i5i G D+, for any clock valuation v^, if -|- i5i = Vi, then 
there exist 0 < 52 G and clock valuation such that (1.1). 52 = V 2 and 
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(vg, v'^), (1.2). is initial iffv^ is initial, = Vq iff = Vq, and for any 

c G C, Sc (resp. V\ S c) iff G c (resp. V2 G c), (1.3). for any clock constraint 
c G C, VO < i5' < Si{v^ + (5 G c) iffyo < S' < S 2 {v'^ + i5 G c). 

(2). for any r C K~^,for any clock valuation v^, ifv^ |r= then there exists a 
valuation such that (2.1). |r= "^2 and {vq, v^), (2.2). same as (1.2). 

5 Pushdown Timed Automata 

A pushdown timed automaton (PTA) A is a tuple {S, {a;i, • • • , Xk\, Inv, R, F, PD), 
where S' is a finite set of states, xi, - ■ ■ ,Xk are (dense) clocks. Inv : S ^ C assigns 
a clock constraint over clocks xi, - ■ ■ ,Xk, called an invariant, to each state. R : S x 
S ^ C X assigns a clock constraint over clocks xi, - ■ ■ ,Xk, called a reset 

condition, and a subset of clocks, called clock resets, to a (directed) edge in S x S. F is 
the stack alphabet. PD : S x S ^ F x F* assigns a pair (a, 7) with a G F and y G F* , 
called a stack operation, to each edge in S x S. A stack operation (o, 7) replaces the 
top symbol a of the stack with a string (possibly empty) in F*. A timed automaton is a 
PTA without the pushdown stack. 

The semantics of A is defined as follows. A configuration is a triple (s, v, w) of a 
state s, a clock valuation v onxo, - ■ ■ ,Xk (where xq is the auxiliary clock), and a stack 
word w G F*. (si, Vi, wi) (s2, V 2 , W 2 ) denotes a one-step transition of A if one 
of the following conditions is satisfied: 

- (a progress transition) si = S2, = W2, and 30 < (5 G D+, V2 = v\ -\- 5 and for 

all S' satisfying 0 < < 5 , tti + 15 ' G Inv{si). That is, a progress transition makes 

all the clocks synchronously progress by amount 5 > 0, during which the invariant 
is consistently satisfied, while the state and the stack content remain unchanged. 

- (a reset transition) V\ G Inv(si) A c, Vi |j.= V2 G Inv{s 2 ), and w\ = aw, W2 = 
jw for some w G F*, where R{si, S 2 ) = (c, r) for some clock constraint c and 
clock resets r, and PD{s\, S 2 ) = (a, 7) for some stack symbol a G P and string 
7 G T*. That is, a reset transition, by moving from state si to state S2, resets 
every clock in r to 0 and keeps all the other clocks unchanged. The stack content 
is modified according to the stack operation (a, 7) given on edge (si, S2). Clock 
values before the transition satisfy the invariant Inv{si) and the reset condition c; 
clock values after the transition satisfy the invariant Inv{s 2 ). 

We write — to be the transitive closure of Given two valuations Vq and Vi, 
two states sq and si, and two stack words wq and wi, assume the auxiliary clock Xq 
starts from 0 , i.e., Vq is initial. The following result is surprising. It states that, for any 
initialized pair (ng,tt2) with (ttj, Ui)r:!(uq, V2), (^so,vl,wo) (si, tti, wi) if and 
only if (sg, i?g, wq) — (si, U2, wi). This result implies that, from the definition of «, 
for any fixed sq,si,wo and wi, the pattern of ([ugj , [ttij) (instead of the actual values 
of [ujj and [ttij ), the integral values [ug] , and the integral values [ui] are sufficient to 
determine whether (sq, «g, wq) can reach (si, tti, wi) in A. The proof is an induction 
on the length of (sg, Vq, wq) (si, Ui, wi) using Lemma^Jand Lemma |31 

Lemma 5 . Let A be a PTA. For any states sq and si, any two initial clock valua- 
tions Vq and Vq, any two clock valuations Vi and V 2 , and any two stack words Wq 
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andwi, if {vl,Vi)Ki{vl,V 2 ), then, (so,t>J,wo) {si,vi,wi) ijf {so,vl,wo) 
(Sl,t>2, Wl). 



Example 3. It is the time to show an example to convince the reader that LemmaElin- 
deed works. Consider a timed automaton shown in FigureQl Let Dq = (0, 4.98, 2.52), 




Fig. 1. An Example Timed Automaton A. 



v\ = (5.36, 2.89, 7.88). (si, uj) (s 2 , "fa) is witnessed by: (si, vj) — (progress 
by 2.47 at si) (si, uj) (reset x\ and transit to S 2 ) (s 2 , V 2 ) (progress by 2.89 
at S 2 ) (s 2 , v\). Take a new pair Vq = (0, 4.89, 2.11), = (5.28, 2.77, 7.39). It is easy 

to check (uj, U 3 )«(uq, t)|). ^From Lemma|3 (si,Uq) (s 2 ,v§). Indeed, this is 
witnessed by (si, Dq) ^.4 (progress by 2.51 at si) ( 51 , 1 ) 3 ) ^a (reset xi and transit to 
S 2 ) ( 52 , 1 ) 2 ) ^A (progress by 2.77 at 52 ) (s 2 , d|). These two witnesses differ slightly 
(2.47 and 2.89, vs. 2.51 and 2.77). We choose 2.77 and 2.51 by looking at the first wit- 
ness backwardly. That is, is picked such that (uq, U 2 )«(dJ, d^)- Then, is picked 
such that (dq, U 3 )«(uJ, D 3 ). The existence of and vf is guaranteed by Lemma|3 
Finally, according to LemmaEJagain, is able to go back to Vq. This is because v\ 
goes back to Dq through a one-step transition and Dq is initial. I 

Now, we express in a form that treating the integral parts and the fractional 
parts of clock values separately. Given a pattern rj G <P, for any discrete valuations Uq 
and iti, and any stack words wq and wi, define (sq, uq, wq) ^ ( 51 , iti, ici) to be 
3 do3di(uo(0) = 0 a [uol = ito A [di] = iti A (Doir)!) S ?? A (5 o,do,w)o) 

(5i,ui,u;i)). 

Lemma 6. Let A be a PTA. For any states sq and 5i, any initialized pair (uo,Di), 
and any stack words Wq and w\, (sq,Vo,wo) (si,Vi,wi) V,,g^(i?o(0) = OA 
([uoj, Lr’iJ) G ?7 A (50, rDol,wo) T'^il > 'w^i))- 

Once we give a characterization of Lemma|^ immediately gives a character- 

ization for — . A decidable characterization of — ^ is shown in the next section. 

6 The Pattern Graph of a Timed Pushdown Automaton 

Let A = {S, {si, • • • , Xk}, Inv, R, F, PD) be a PTA specified in the previous section. 
The pattern graph G of A is a tuple {S x {yo, • • • , yk}, E, F) where S is the states in 
A, F is the set of all patterns. A node is an element in S' x <P. Discrete clocks yo, - ■ ■ ,yk 
are the integral parts of the clocks Xq, • • • , Xfc in A. E is a finite set of (directed) edges 
that connect pairs of nodes. An edge can be a progress edge, a stay edge, or a reset edge. 
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A progress edge corresponds to progress transitions in A that cause one pattern change. 
A stay edge corresponds to progress transitions in A that cause no pattern change. Since 
a progress transition can cause no pattern change only from a merge-pattern, a stay edge 
connects a merge-pattern to itself. A reset edge corresponds to a reset transition in A. 
Formally, a progress edge Cs, that connects node (s, 77 ) to node (s, rj') is in the form 
of {{s,r]),c, {s,T]')) such that c = Inv{s), rj' = Next{r]) (thus ry ^ r(). A stay edge 
^s,ri,ri, with T] being a merge-pattern, that connects node (s, ry) to itself is in the form 
of ((s, ry), c, (s, ry)) such that c = Inv{s). A reset edge ,r,{a,-y) '^hat connects node 
(s, ?y) to node (s', ry') is in the form of ((s, ry), c, r, a, 7 , (s', ry')) where i?(s, s') = (c, r) 
and PD{s, s') = (a, 7 ). E is the set of all progress edges, stay edges, and reset edges 
wrt A. Obviously, E is finite. 

A conhguration of G is a tuple (s, ry, u, w) of state s G S, pattern rj G discrete 
valuation w G (N+)^+^ and stack word u> G E*. {s,rj,u,w) — {s' ,r]' ,u' ,w') de- 
notes a one-step transition through edge e of G if the following conditions are satisfied: 

- if e is a progress edge, then e takes the form ((s, ry), c, (s, ry')) and s' = s, u G E’, 
u' G A' , next{rj, u) — (ry', u') and w = w' . Here and are called the pre- 
and the post- (progress) tests on edge e, respectively. 

- if e is a stay edge, then e takes the form ((s, ry), c, (s, ry)) and s = s' ,u G E’jU = 
u' ,rj = rj' and w = w'. Here c’' is called the pre- and the post- (stay ) tests on edge 
e. 

- if e is a reset edge, then e takes the form ((s, ry), c, r, a, 7 , (s', ry')) and u G {c f\ 
Inv{s))'^, u' G Inv{s')'' , resetr{rj,u) = {rj' ,u') and w = aw" ,w' — 'jw" 
for some w" G E* (i.e., w changes to w' according to the stack operation). Here 
(c A Inv{s))'" and Inv{s')'' are called the pre- and the post- (reset) tests on edge 
e, respectively. 

We write (s, tj, u, w) {s', tj' , u', w') if (s, rj, u, w) (s', ry', it', w') for some e. 

The binary reachability — of G is the transitive closure of -^g- 

The pattern graph G simulates Aina way that the integral parts of the dense clocks 
are kept but the fractional parts are abstracted as a pattern. Edges in G indicates how 
the pattern and the discrete clocks change when a clock progress or a clock reset occur 
in A. However, a progress transition in A could cause more than one pattern change. 
In this case, this big progress transition is treated as a sequence of small progress tran- 
sitions such that each causes one pattern change (and therefore, each small progress 
transition in A can be simulated by a progress edge in G). We first show that the bi- 
nary reachability of G is NPCM. Observe that discrete clocks yo, - ■ ■ ,yk are the 
integral values of dense clocks xq, - ■ ■ ,Xk - Even though the dense clocks progress syn- 
chronously, the discrete clocks may not be synchronous (i.e., that one discrete clock is 
incremented by 1 does not necessarily cause all the other discrete clocks incremented 
by the same amount.). The proof has two parts. In the first part of the proof, a technique 
is used to translate yo, - ■ ■ ,yk into another array of discrete clocks that are synchronous. 
In the second part of the proof, G can be treated as a discrete PTA 11.51 by replacing 
yo,‘ ‘ ‘ ,yk with the synchronous discrete clocks. Therefore, Lemma [T] follows by the 
fact O that the binary reachability of discrete PTA is NPCM. 



Binary Reachability Analysis of Pushdown Timed Automata with Dense Clocks 



515 



Lemma 7. For any PTA A, the binary reachability —fQ of the pattern graph G of A 
is NPCM. In particular, if A is a timed automaton, then the binary reachability —fQ is 
Presburger. 

The following lemma states that G faithfully simulates A when the fractional parts 
of dense clocks are abstracted away by a pattern. The if-part of the lemma uses Lemma 
0 The only-if-part of the lemma is based upon the argument that a one-step transition 
of A, when the pattern abstraction is used, can be simulated by a sequence of transitions 
ofG. 

Lemma 8. Let Abe a PTA with pattern graph G. For any sq , si C S, p G <P, wo,wi G 
r*, and (uo,Ui) with Mo(0) = 0, {so,uq,wo) ~^Xv “d "“^ i) {so,init{p), 

Uq,Wo) {si,p,ui,wi). 

Now, we conclude this section by claiming that ^ is NPCM by combining 
Lemma0and Lemma|Hl 

Lemma 9. For any PTA A and any fixed pattern p G d>, ^ is NPCM. In particular, 

if A is a timed automaton, then is Presburger. 



1 A Decidable Binary Reachability Characterization and 
Automatic Verification 

Recall that PTA A actually has clocks X\, - ■ ■ ,Xk. xq'k the auxiliary clock. The binary 
reachability'^'^ of A is the set of tuples (s, ui, • • • , Vk, w, s' ,v[, ■ ■ ■ , v'f., w') such that 
there exist uq = 0, Wg G D+ satisfying (s, vq, ■ ■ ■ ,Vk,w) (s', v'q, - ■ ■ , v'f., w'). The 
main theorem of this paper gives a decidable characterization for the binary reachability 
as follows. The proof uses Lemma0and Lemma0 

Theorem 1. The binary reachability of a PTA A is (D + IN'PCNV)- definable. 
In particular, if A is a timed automaton, then the binary reachability can be 

expressed in the additive theory of reals (or rationals) and integers. 

The importance of the above characterization for is that, from Lemma 0 the 
emptiness of (D + NPCM)-definable predicates is decidable. gFrom Theorem0and 
Lemma0(3)(4), we have. 

Theorem 2. The emptiness of I H with respect to a PTA A for any mixed linear 
relation I is decidable. 

The emptiness of I n is called a mixed linear property of A. Many interesting 

safety properties (or their negations) for PTAs can be expressed as a mixed linear prop- 
erty. For instance, consider the following property of a PTA A. 

“for any two configurations a and (3 with a (3, if the difference between /3a,3 
(the value of clock X 3 in /3) and + ax 2 (the sum of clocks x\ and X 2 in a) is greater 
than the difference between (the number of symbol a appearing in the stack 

word in a) and #h(/3w) (the number of symbol b appearing in the stack word in /3), 
then #a(aw) - 2#{,(/?w) is greater than 5.” 
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The negation of this property can be expressed in the form required by Theorem |2| 
Thus, this property can be automatically verihed. Notice that this property can not be 
verified by using results in |Bl and (even when clocks are ignored) in Em. When A 
is a timed automaton, by Theorem the binary reachability can be expressed in 
the additive theory of reals (or rationals) and integers. Notice that this characterization 
is essentially equivalent to the one given by Comon and Jurski m in which can 
be expressed in the additive theory of reals augmented with a predicate telling whether 
a term is an integer. Because the additive theory of reals and integers is decidable (see 
I'TOl for a procedure), we have, 

Theorem 3. The truth value for any closed formula expressible in the (first-order) ad- 
ditive theory of reals (or rationals) augmented with a predicate for a timed au- 
tomaton A is decidable. ( also shown in unji) 



8 Conclusions 

In this paper, we consider PTAs that are timed automata augmented with a pushdown 
stack. By introducing the concept of a clock pattern and using an automata-theoretic 
approach, we give a decidable characterization of the binary reachability of a PTA. The 
results can be used to verify a class of safety properties containing linear relations over 
both dense variables and unbounded discrete variables. 

The results in this paper can be extended to PTAs augmented with reversal-bounded 
counters. A future research issue is to investigate whether the liveness results in O and 
the approximation techniques in ill 61 can be extended to dense clocks. Another issue is 
on the complexity analysis of the decision procedure presented in this paper. However, 
the complexity for the emptiness problem of NPCMs is still unknown, though it is 
believed that it can be derived along Gurari and Ibarra El. The results in this paper 
can be used to implement a model-checker for a subset of the real-time specification 
language ASTRAL EH as well as for a class of real-time programming language with 
procedure calls (such as a timed version of Boolean programs 0 ). 
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